Skip to content

Implementation of Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Model (ICML 2025 LCFM Workshop)

Notifications You must be signed in to change notification settings

schoi828/VerbatimEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Models

Kyu Won Kim*    Suhwan Choi*    Myeongho Jeon   

ICML 2025 Workshop on Long-Context Foundation Models

This repository provides the official implementation of the VerbatimEval framework, introduced in the paper "Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Models".

Abstract

Accurately processing long texts and generating precise responses remains a significant challenge for large language models (LLMs). While existing benchmarks evaluate long-text comprehension, they often overlook the models’ ability to faithfully preserve the exact wording, formatting, and sequence of prompts in their responses. To address this gap, we propose a novel evaluation framework with two key advantages: (i) adaptability across diverse domains and data sources, and (ii) tunable difficulty through dynamic variation of text length. Across three tasks—mathematical, contextual, and semantic reasoning—we find that even state-of-the-art long-context LLMs exhibit notable difficulty in maintaining verbatim fidelity during long-text generation.

Project Structure

The repository is organized as follows:

  • experiments.py: Defines the core experiment classes, including number sorting, sentence shuffling, and entity grouping.
  • llm.py: Provides a modular interface for interacting with different LLMs (e.g., OpenAI, Gemini).
  • exp.py: The main script for running experiments.
  • exp.yaml: Configuration file for the experiments.
  • requirements.txt: A list of Python dependencies.
  • run.sh: An example shell script to run experiments.

Setup

1. Create Conda Environment

First, create a Conda environment with Python 3.10:

conda create -n verbatimeval python=3.10
conda activate verbatimeval

2. Install Dependencies

Install the required Python packages using pip:

pip install -r requirements.txt

3. Set Up API Keys

The framework requires API keys for the language models you intend to use (e.g., OpenAI, Google Gemini). Create a .env file in the root directory and add your keys:

OPENAI_API_KEY="your_openai_api_key"
GOOGLE_API_KEY="your_google_api_key"

The exp.py script will load these environment variables.

Usage

You can run experiments using the exp.py script. The script takes several arguments to specify the experiment, model, and other parameters.

Example

To run the number sorting experiment with the gpt-4o model, use the following command:

python exp.py --save_name="my_first_exp" --exp_name="num_sort" --model_name="gpt-4o"

You can also use the provided shell script run.sh as a template for running experiments.

Arguments

  • save_name (str): A unique name for the experiment run.
  • exp_name (str): The name of the experiment. Options: num_sort, sentence_shuffle, grouping.
  • model_name (str): The name of the model to use. See llm.py for supported models.
  • --config_path (str, optional): Path to the configuration file. Defaults to exp.yaml.
  • --sample_size (int, optional): Override the sample size in the config.
  • --num_test (int, optional): Override the number of tests in the config.
  • --dataset_name (str, optional): Override the dataset name in the config.

About

Implementation of Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Model (ICML 2025 LCFM Workshop)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published