Hello authors, are there any example datasets for model training? Or how can I generate the correct input file format when training the model?