-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Image-based filtering. We select a subset of examples whose visual content overlaps with ImageNet
classes. After applying English language (fasttext) and caption length filtering, we cluster the
image embeddings extracted by the OpenAI ViT-L/14 model for each image into 100K groups using
Faiss [ 75]. We then find the nearest neighbor group for every ImageNet training example, and keep
examples belonging to these groups. We apply this procedure using either ImageNet-21K (14M
images) or ImageNet-1K (1.2M images), forming two subsets.
In the paper, regarding the composition of "Image filters", it mentions that either ImageNet-21K or ImageNet-1K can be used. Looking into the code however, especially for the Datacomp 1B, it looks like only IN1K is used. Is there a version of the Datacomp 1B with IN21K?