Skip to content

Conversation

@danielmend
Copy link
Contributor

No description provided.

@iejMac
Copy link
Collaborator

iejMac commented Oct 22, 2022

Thinking out loud:

We want to support single:

  • single-segment multi-caption
  • multi-segment single-caption

Not sure if we should put effort into multi-segment multi-caption because I haven't seen any examples of this and it seems like it's just bad organization of your dataset so we don't really care about that.

@iejMac
Copy link
Collaborator

iejMac commented Oct 22, 2022

Looked over the PR and have some thoughts but I'll put the overarching idea here:

I think we shouldn't have people write specific process_segment functions but rather organize the "segment" and "caption" columns of their csv or parquet to have information that works with our general process_segment function.

For example if I know in video 1 I have segments from 0:25s, 25s:50s, 50s:75s with cap1, cap2, cap3 I should make my csv like so:

segments: "(0,25),(25,50),(50,75)"
caption: "cap1;cap2;cap3"

Then when our general proces_segment function sees these columns it takes the embedding, chops them up according to segments and chops captions up as well

if segment:
samp += 1
if segment:
segments = batch["meta"]["times"] # change to ...['segment']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to 'segment'

for c in cap.split(";"): # multiple captions separated by ;
toks.append(open_clip.tokenize(c))
ground_truth.append(samp)
if segment:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samp += segment



def retrieval_evaluation(model_video, model_text, data, multicaption=False):
def retrieval_evaluation(model_video, model_text, data, multicaption=False, segment=False, process_segments=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove process_segments and segment args, instead just check for the "segments" key in the batch["meta"]


return out

def process_didemo_segments(embeddings, segments, seq_len=200):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no function, no specific code for each dataset, just move these operations into the "if "segments" in batch["meta"]" part of the retrieval eval


# Pyre type checker
.pyre/
CLIP-DiDeMo/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup

import open_clip
import torch

sys.path.insert(1, '/Users/daniel/Desktop/LAION_Videoclip/clip-video-encode')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup


with torch.no_grad():
for i, batch in enumerate(dataloader):
if i==3:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup

toks.append(open_clip.tokenize(cap))
ground_truth.append(samp)
samp += 1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup

all_video_features.append(video_embeddings.cpu())
all_text_features.append(text_embeddings.cpu())

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants