About activitynet captions dataset in CLIP-ViP 

hello, thank you for sharing your excellent work!
I have reproduced result in msrvtt and even acquire a higher result than that in paper.

But when I tried to reproduce on activitynet captions, I found that in `actnet_retrieval_vip_base_32.json`the vision format setting is frame instead of video, and I tried to reproduce on vision format `video` with sampling 32 frames setting it almost reach to r@1=20 finally.
Then I use opencv library to extract frames but it still can’t reach the result in paper.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About activitynet captions dataset in CLIP-ViP #41

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About activitynet captions dataset in CLIP-ViP #41

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions