TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification
This is the official repository of TROVE (ACL 2025).
├── Part1_origin_data/ # Raw dataset
├── Part2_human_annotated/ # Human-annotated dataset
├── Part3_annotated_result/ # Merged human annotations and preliminary retrieval results.
├── Part4_evaluation_data/ # Evaluation data for models (including prompts), with texts segmented by sentences.
├── step1_process_annotation.py # Data processing script step1 - generates "Part3_annotated_result"
├── step2_generate_evaluation.py # Data processing script step2 - generates "Part4_evaluation_data"
├── inference_close_model.py # Evaluation for closed-source models
├── inference_open_model.py # Evaluation for open-source models
├── metrics.py # evaluation
Please install the following:
- python 3.10
- torch 2.3.1
- transformers 4.40.2
Due to file size limits, the data folders are hosted on Google Drive.
Setup: Please unzip the downloaded file and place the extracted subfolders directly into the root of this repository.
The data in "Part4_evaluation_data" is our complete dataset and can be directly used for model evaluation.
- For open-source models:
sh ./open_infer.sh
- For closed-source models:
sh ./close_infer.sh
python metrics.pyIf you use TROVE in your research, please cite our paper:
@inproceedings{zhu-etal-2025-trove,
title = "{TROVE}: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification",
author = "Zhu, Junnan and
Xiao, Min and
Wang, Yining and
Zhai, Feifei and
Zhou, Yu and
Zong, Chengqing",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.577/",
pages = "11755--11771",
ISBN = "979-8-89176-251-0"
}