[Discussion/Question] Weak Controllability on Training Set with LoRA Finetuning (POV Motion Dataset)

**Context** I am experimenting with finetuning the 2B model using LoRA on a dataset of First-Person POV videos. My goal is to achieve specific motion control, but I am encountering significant issues with controllability, even when testing against the training samples.

**Dataset & Training Details**

- Source Data: 151 POV videos.

- Augmentation: I mirrored videos containing turning motions, bringing the total dataset to 206 videos.

- Video Length: 121 frames per video.

- Captions/Annotations: The dataset uses sparse captions focusing strictly on motion primitives (e.g., "turning left," "moving forward") rather than dense, detailed visual descriptions. Example captions: "Head right towards the seat.", "Turn left and go straight.".

- Training Method: LoRA using the default config file, trained for 10 epochs

**The Issue** After finetuning, the model exhibits weak controllability. This is particularly noticeable in turning motions, where the model fails to adhere to the prompt even when generating samples from the training set (failure to overfit).

**Hypothesis & Questions** I suspect the issue might stem from the caption density or dataset size, but I would value your insights on the following:

- Caption Granularity: My captions only describe motion (primitives). Does this model require detailed visual descriptions (background, objects, lighting) in the prompt to learn the motion-text alignment effectively?

- Dataset Size: Is ~200 videos generally considered insufficient for this specific model architecture to learn temporal dynamics like turning?

Any advice on data preparation or hyperparameter tuning for this specific use case would be greatly appreciated. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion/Question] Weak Controllability on Training Set with LoRA Finetuning (POV Motion Dataset) #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Discussion/Question] Weak Controllability on Training Set with LoRA Finetuning (POV Motion Dataset) #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions