Skip to content

[Discussion/Question] Weak Controllability on Training Set with LoRA Finetuning (POV Motion Dataset) #63

@J1dan

Description

@J1dan

Context I am experimenting with finetuning the 2B model using LoRA on a dataset of First-Person POV videos. My goal is to achieve specific motion control, but I am encountering significant issues with controllability, even when testing against the training samples.

Dataset & Training Details

  • Source Data: 151 POV videos.

  • Augmentation: I mirrored videos containing turning motions, bringing the total dataset to 206 videos.

  • Video Length: 121 frames per video.

  • Captions/Annotations: The dataset uses sparse captions focusing strictly on motion primitives (e.g., "turning left," "moving forward") rather than dense, detailed visual descriptions. Example captions: "Head right towards the seat.", "Turn left and go straight.".

  • Training Method: LoRA using the default config file, trained for 10 epochs

The Issue After finetuning, the model exhibits weak controllability. This is particularly noticeable in turning motions, where the model fails to adhere to the prompt even when generating samples from the training set (failure to overfit).

Hypothesis & Questions I suspect the issue might stem from the caption density or dataset size, but I would value your insights on the following:

  • Caption Granularity: My captions only describe motion (primitives). Does this model require detailed visual descriptions (background, objects, lighting) in the prompt to learn the motion-text alignment effectively?

  • Dataset Size: Is ~200 videos generally considered insufficient for this specific model architecture to learn temporal dynamics like turning?

Any advice on data preparation or hyperparameter tuning for this specific use case would be greatly appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions