Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Models

Kyu Won Kim* Suhwan Choi* Myeongho Jeon

ICML 2025 Workshop on Long-Context Foundation Models

This repository provides the official implementation of the VerbatimEval framework, introduced in the paper "Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Models".

Abstract

Accurately processing long texts and generating precise responses remains a significant challenge for large language models (LLMs). While existing benchmarks evaluate long-text comprehension, they often overlook the models’ ability to faithfully preserve the exact wording, formatting, and sequence of prompts in their responses. To address this gap, we propose a novel evaluation framework with two key advantages: (i) adaptability across diverse domains and data sources, and (ii) tunable difficulty through dynamic variation of text length. Across three tasks—mathematical, contextual, and semantic reasoning—we find that even state-of-the-art long-context LLMs exhibit notable difficulty in maintaining verbatim fidelity during long-text generation.

Project Structure

The repository is organized as follows:

experiments.py: Defines the core experiment classes, including number sorting, sentence shuffling, and entity grouping.
llm.py: Provides a modular interface for interacting with different LLMs (e.g., OpenAI, Gemini).
exp.py: The main script for running experiments.
exp.yaml: Configuration file for the experiments.
requirements.txt: A list of Python dependencies.
run.sh: An example shell script to run experiments.

Setup

1. Create Conda Environment

First, create a Conda environment with Python 3.10:

conda create -n verbatimeval python=3.10
conda activate verbatimeval

2. Install Dependencies

Install the required Python packages using pip:

pip install -r requirements.txt

3. Set Up API Keys

The framework requires API keys for the language models you intend to use (e.g., OpenAI, Google Gemini). Create a .env file in the root directory and add your keys:

OPENAI_API_KEY="your_openai_api_key"
GOOGLE_API_KEY="your_google_api_key"

The exp.py script will load these environment variables.

Usage

You can run experiments using the exp.py script. The script takes several arguments to specify the experiment, model, and other parameters.

Example

To run the number sorting experiment with the gpt-4o model, use the following command:

python exp.py --save_name="my_first_exp" --exp_name="num_sort" --model_name="gpt-4o"

You can also use the provided shell script run.sh as a template for running experiments.

Arguments

save_name (str): A unique name for the experiment run.
exp_name (str): The name of the experiment. Options: num_sort, sentence_shuffle, grouping.
model_name (str): The name of the model to use. See llm.py for supported models.
--config_path (str, optional): Path to the configuration file. Defaults to exp.yaml.
--sample_size (int, optional): Override the sample size in the config.
--num_test (int, optional): Override the number of tests in the config.
--dataset_name (str, optional): Override the dataset name in the config.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Models

Abstract

Project Structure

Setup

1. Create Conda Environment

2. Install Dependencies

3. Set Up API Keys

Usage

Example

Arguments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
exp.py		exp.py
exp.yaml		exp.yaml
experiments.py		experiments.py
llm.py		llm.py
requirements.txt		requirements.txt
run.sh		run.sh

schoi828/VerbatimEval

Folders and files

Latest commit

History

Repository files navigation

Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Models

Abstract

Project Structure

Setup

1. Create Conda Environment

2. Install Dependencies

3. Set Up API Keys

Usage

Example

Arguments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages