Could combining datasets improve success rates?

First of all, thank you so much for your work and for open-sourcing it!

Currently, we have a Franka Research 3 set up to solve a single task. We first tried different out-of-the-box policies, which resulted in an expected 0% success rate. Then, we implemented a script to extract demonstrations of the specific task from the DROID dataset, and we fine-tuned OpenVLA on these demonstrations. The robot performed much better but still could not solve the task even once. We then recorded some demonstrations ourselves to train a new checkpoint on OpenVLA, 

which led us to think:
- Could we improve the success rate by fine-tuning on both the DROID-extracted demonstrations and the recorded demonstrations, which both solve the same task?
- If we wanted to fine-tune a checkpoint on both sets of demonstrations, what would be the ideal way to do so, given that the magnitude of the actions differ greatly? 
  - Should we fine-tune on the DROID demonstrations first, then fine-tune the resulting checkpoint on our demonstrations? 
  - Or would it be better to create a mixed dataset combining both sets of demonstrations? 
    - If so, would it make sense to normalize the actions for both sets before fine-tuning to circumvent the gap in the action scale between the two sets?

Thank you very much. I appreciate all input on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could combining datasets improve success rates? #67

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Could combining datasets improve success rates? #67

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions