UCD

This README provides setup instructions and usage guidance to run evaluations on the TruthfulQA dataset using both a baseline model and the proposed UCD method.

📦 Environment Setup

1. Create and Activate Conda Environment

conda create -n ucd python==3.10
conda activate ucd

2. Install Required Packages

pip install -r requirements.txt

3. Install Transformers Locally (Editable Mode)

cd ./transformers
pip install --editable ./

🧪 Run TruthfulQA Evaluation (Table 1)

Step 1: Navigate to the Benchmark Script Directory

cd ./exp_scripts/benchmark

Step 2: Edit the Script

Before running truthfulqa.sh, make sure to edit the following variables in the script:

project_root_path: Root path to your project (e.g., "UCD")
model_name: Name or path of your experiment model
amateur_model_name: Name or path of your amateur model (only used for UCD)
output_path: Where to store the results and logs

Note: In this code, "amateur_model" refers to the base model in our paper. ⚠️ Important: Make sure that the models you intend to use (model_name and amateur_model_name) are already downloaded or accessible. This script does not handle model downloading.

Step 3: Run the Script

sh truthfulqa.sh

⚙️ Script Structure (truthfulqa.sh)

The script is structured to run across 8 GPUs using parallel execution with sharding. It supports two modes:

🔹 Baseline Inference

Evaluates using a single model in greedy decoding mode.

CMD="CUDA_VISIBLE_DEVICES=$i nohup python ${cli_path} \
    --model-name ${model_name} \
    --num-gpus 1 \
    --data-path ${data_path} \
    --output-path ${output_path}/result \
    --is-chat \
    --mode greedy \
    --parallel \
    --total-shard 8 \
    --shard-id $i \
    ${generation_args} \
    >${output_path}/shard_${i}.log 2>&1 &"

🔹 UCD Method

Evaluates using both the main model and an amateur model. Requires --mode UCD to activate the method.

CMD="CUDA_VISIBLE_DEVICES=$i nohup python ${cli_path} \
    --model-name ${model_name} \
    --amateur-model-name ${amateur_model_name} \
    --num-gpus 1 \
    --amateur-model-nums-gpus 1 \
    --data-path ${data_path} \
    --output-path ${output_path}/result \
    --is-chat \
    --mode UCD \
    --parallel \
    --relative_top 0.0 \
    --total-shard 8 \
    --shard-id $i \
    ${generation_args} \
    >${output_path}/shard_${i}.log 2>&1 &"

📁 Directory Structure Example

UCD/
├── data/
│   └── truthfulqa/
├── src/
│   └── benchmark_evaluation/
│       └── truthfulqa_eval.py
├── transformers/
│   └── (local transformers repo)
├── exp_scripts/
│   └── benchmark/
│       └── truthfulqa.sh
├── requirements.txt
└── README.md

Feel free to reach out for help if paths or settings don't work as expected!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCD

📦 Environment Setup

1. Create and Activate Conda Environment

2. Install Required Packages

3. Install Transformers Locally (Editable Mode)

🧪 Run TruthfulQA Evaluation (Table 1)

Step 1: Navigate to the Benchmark Script Directory

Step 2: Edit the Script

Step 3: Run the Script

⚙️ Script Structure (truthfulqa.sh)

🔹 Baseline Inference

🔹 UCD Method

📁 Directory Structure Example

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data/truthfulqa		data/truthfulqa
exp_scripts/benchmark		exp_scripts/benchmark
src		src
transformers		transformers
README.md		README.md
requirements.txt		requirements.txt

MLAI-Yonsei/UCD

Folders and files

Latest commit

History

Repository files navigation

UCD

📦 Environment Setup

1. Create and Activate Conda Environment

2. Install Required Packages

3. Install Transformers Locally (Editable Mode)

🧪 Run TruthfulQA Evaluation (Table 1)

Step 1: Navigate to the Benchmark Script Directory

Step 2: Edit the Script

Step 3: Run the Script

⚙️ Script Structure (truthfulqa.sh)

🔹 Baseline Inference

🔹 UCD Method

📁 Directory Structure Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages