Continual  pretraining data

Your research is highly valuable to the community, and I believe that having access to the continual pre-training data you used would greatly accelerate further research in this field. As you know, creating high-quality, large-scale continual pre-training datasets is a significant challenge.

I was hoping you might consider open-sourcing the dataset you used for your work. Sharing this data would be a tremendous contribution and would allow others to build upon your foundation, reproduce your results, and explore new research avenues more effectively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Continual pretraining data #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Continual pretraining data #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions