TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification

This is the official repository of TROVE (ACL 2025).

Directory Structure

├── Part1_origin_data/      # Raw dataset
├── Part2_human_annotated/  # Human-annotated dataset
├── Part3_annotated_result/ # Merged human annotations and preliminary retrieval results.
├── Part4_evaluation_data/  # Evaluation data for models (including prompts), with texts segmented by sentences.
├── step1_process_annotation.py # Data processing script step1 - generates "Part3_annotated_result"
├── step2_generate_evaluation.py # Data processing script step2 - generates "Part4_evaluation_data"
├── inference_close_model.py # Evaluation for closed-source models
├── inference_open_model.py # Evaluation for open-source models
├── metrics.py # evaluation

Requirements

Please install the following:

python 3.10
torch 2.3.1
transformers 4.40.2

Quick Start

Dataset

Due to file size limits, the data folders are hosted on Google Drive.

👉 Download Link

Setup: Please unzip the downloaded file and place the extracted subfolders directly into the root of this repository.

The data in "Part4_evaluation_data" is our complete dataset and can be directly used for model evaluation.

Inference

For open-source models:
```
sh ./open_infer.sh
```
For closed-source models:
```
sh ./close_infer.sh
```

Evaluation

python metrics.py

Citation

If you use TROVE in your research, please cite our paper:

@inproceedings{zhu-etal-2025-trove,
    title = "{TROVE}: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification",
    author = "Zhu, Junnan  and
      Xiao, Min  and
      Wang, Yining  and
      Zhai, Feifei  and
      Zhou, Yu  and
      Zong, Chengqing",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.577/",
    pages = "11755--11771",
    ISBN = "979-8-89176-251-0"
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
BM25Retrieval.py		BM25Retrieval.py
DenseRetrieval.py		DenseRetrieval.py
LCSRetrieval.py		LCSRetrieval.py
README.md		README.md
Task.jpg		Task.jpg
close_infer.sh		close_infer.sh
gpt4_grounded.py		gpt4_grounded.py
inference_close_model.py		inference_close_model.py
inference_open_model.py		inference_open_model.py
mergeRetrieval.py		mergeRetrieval.py
metrics.py		metrics.py
open_infer.sh		open_infer.sh
retrieval.py		retrieval.py
step1_process_annotation.py		step1_process_annotation.py
step2_generate_evaluation.py		step2_generate_evaluation.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification

Directory Structure

Requirements

Quick Start

Dataset

Inference

Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

ZNLP/TROVE

Folders and files

Latest commit

History

Repository files navigation

TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification

Directory Structure

Requirements

Quick Start

Dataset

Inference

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages