-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hello, thank you very much for sharing your paper and code - I truly appreciate your work on this project.
While attempting to reproduce the results with the WebVid dataset, I encountered some difficulties in obtaining the query sets used in the code. Specifically, in dataset.py, the Webvid object is defined as follows:
class Webvid(AnnDatasetSelfTrain):
# https://zenodo.org/records/11090378
# {'nb': 2495000, 'dim': 512, 'train_queries': (1000000, 512), 'train_gts': (1000000, 50), 'test_queries': (10000, 512), 'test_gts': (10000, 50), 'self_train_gts': (2495000, 50), 'self_test_queries': (10000, 512), 'self_test_gts': (10000, 50)}
# OOD
# l2-normalized, l2 = ip
def __init__(self, train_set_len = -1, self_train_set_len = -1, path = "/media/nd/webvid_split") -> None:
super().__init__("webvid-2.5M", "ip", path, train_set_len, self_train_set_len)
self.base_fn = 'clip.webvid.base.2.5M.fbin' # Video
self.train_query_fn = 'webvid.query.train.2.5M.fbin' # Text
self.train_gt_fn = 'webvid-2.5M.learn_ip.2.5M.ibin'
self.test_query_fn = 'webvid.query.10k.fbin' # Text
self.test_gt_fn = 'webvid-2.5M.10k.ibin'
self.self_test_query_fn = 'self_webvid.query.10k.fbin'
self.self_test_gt_fn = 'self_webvid-2.5M.self_learn_ip.2.5M.ibin'
self.self_train_gt_fn = 'webvid-2.5M.self_learn_ip.2.5M.ibin'Following the provided Zenodo link, I was able to download some files, but a few seem to be missing. To clarify my understanding:
- webvid-2.5M.learn_ip.2.5M.ibin
- I computed this directly using webvid.query.train.2.5M.fbin as queries and clip.webvid.base.2.5M.fbin as the base.
- webvid-2.5M.self_learn_ip.2.5M.ibin
- I computed this using webvid.query.train.2.5M.fbin as both queries and base.
- self_webvid.query.10k.fbin
- This part is unclear to me.
- From my understanding, it should be queries with the same modality as the base, but according to the paper, “in-distribution queries and out-of-distribution queries, denoted as Q_i and Q_o, respectively, which are independent of the training data T_i and T_o”.
- Therefore, it seems this query set should be different from the ones already used in training.
- Would it be possible for you to provide guidance on how to obtain this file, or if it could be shared directly?
Could you kindly confirm whether my approach for (1) and (2) is correct, and provide clarification (or availability) for (3)?
Thank you very much for your time and support.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels