Skip to content

Could combining datasets improve success rates? #67

@ehaller00

Description

@ehaller00

First of all, thank you so much for your work and for open-sourcing it!

Currently, we have a Franka Research 3 set up to solve a single task. We first tried different out-of-the-box policies, which resulted in an expected 0% success rate. Then, we implemented a script to extract demonstrations of the specific task from the DROID dataset, and we fine-tuned OpenVLA on these demonstrations. The robot performed much better but still could not solve the task even once. We then recorded some demonstrations ourselves to train a new checkpoint on OpenVLA,

which led us to think:

  • Could we improve the success rate by fine-tuning on both the DROID-extracted demonstrations and the recorded demonstrations, which both solve the same task?
  • If we wanted to fine-tune a checkpoint on both sets of demonstrations, what would be the ideal way to do so, given that the magnitude of the actions differ greatly?
    • Should we fine-tune on the DROID demonstrations first, then fine-tune the resulting checkpoint on our demonstrations?
    • Or would it be better to create a mixed dataset combining both sets of demonstrations?
      • If so, would it make sense to normalize the actions for both sets before fine-tuning to circumvent the gap in the action scale between the two sets?

Thank you very much. I appreciate all input on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions