InfoBench

Paper: InFoBench: Evaluating Instruction Following Ability in Large Language Models
Dataset: InFoBench Dataset
Generation and Annotation: InFoBench Generation and Annotation

Citation

@article{qin2024infobench,
      title={InFoBench: Evaluating Instruction Following Ability in Large Language Models}, 
      author={Yiwei Qin and Kaiqiang Song and Yebowen Hu and Wenlin Yao and Sangwoo Cho and Xiaoyang Wang and Xuansheng Wu and Fei Liu and Pengfei Liu and Dong Yu},
      year={2024},
      eprint={2401.03601},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Evaluation with InFoBench

Step1: Dataset Usage

You can directly download it with huggingface datasets.

from datasets import load_dataset

dataset = load_dataset("kqsong/InFoBench")

Step2: Generating the response

Provide an output file in model/output.json. Each data entry should be a json object with a newline, containing all the fields in the input format. The generated response should be included in the json object with the new field named output.

We suggest using greedy decoding to avoid the randomness of decoding.

Step3: Evaluation

Evaluate LLM's outputs on decomposed questions. Using GPT-4-0314 by default in this research.

python evaluation.py \
  --api_key <OPENAI KEY> \
  --eval_model gpt-4-0314 \
  --input model/output.json \
  --output_dir evaluation/ \
  --temperature 0

Each data entry will include an "eval" key in the format of List[bool] which represents "Yes" or "No" answers to each decomposed question. The final output evaluation file will be saved in JSON format at location <output_dir>/<eval_model>/.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InfoBench

Citation

Evaluation with InFoBench

Step1: Dataset Usage

Step2: Generating the response

Step3: Evaluation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

qinyiwei/InfoBench

Folders and files

Latest commit

History

Repository files navigation

InfoBench

Citation

Evaluation with InFoBench

Step1: Dataset Usage

Step2: Generating the response

Step3: Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages