LibrA-Eval

Introduction

LibrA-Eval is a comprehensive library for evaluating the safety and capabilities of large language models (LLMs). It powers the Libra-Leaderboard, where you can compare different LLMs across various safety and robustness metrics. For details about our evaluation tasks and datasets, check out our documentation.

Quick Start

Installation

git clone https://github.com/LibrAIResearch/libra-eval
cd libra-eval
pip install -e .

Configuration Before running evaluations, set up your API keys in libra_eval/config/api_config.json based on your needs:

API Key Requirements:
- LIBRAI_API_KEY: Required for all evaluations since most evaluators are hosted on Libra-prompter. Get your key here
- OPENAI_API_KEY: Only required if you want to evaluate OpenAI-hosted models
- NEXT_API_KEY: Only required if you want to evaluate models hosted on OpenAI-Next. Get your key here
- Local Models: If you're evaluating locally hosted models, no API keys are required except LIBRAI_API_KEY

Basic Usage

# Run a single evaluation task (need openai_api_key)
python -m libra_eval.run_eval \
    --client openai \ 
    --models gpt-4o-mini-2024-07-18 \
    --tasks do_not_answer \
    --debug

Detailed Usage Guide

Command Line Options

--client: LLM provider ("openai", "local", "next")
--models: Model names (comma-separated, or "all")
--tasks: Task names (comma-separated, or "all")
--debug: Enable debug mode
--rewrite_cache: Force new evaluation instead of using cached results
--n_samples_per_task: Number of samples to evaluate
--output_dir: Output directory (default: "./outputs")
--mode: Execution mode
- "full": Complete pipeline (default)
- "inference": Only generate responses
- "evaluation": Only evaluate existing responses

Example with Multiple Models

# To run the following evaluation with Next client, openai-next-key is required
python -m libra_eval.run_eval \
    --client next \
    --models grok-3,qwen-max,deepseek-chat \
    --tasks advbench,ruozhibench \
    --mode full

Output Structure

Results are organized in three directories:

outputs/evaluations/: Evaluation results
outputs/responses/: Model responses
outputs/results/: Processed final results

Analyzing Results

# Collect and summarize results
python -m libra_eval.utils.results [--output_dir ./outputs] [--models all] [--tasks all]

# List available models
python -m libra_eval.utils.models

# List available tasks
python -m libra_eval.utils.tasks

Important Notes

⚠️ Security Warnings:

Never commit API keys
Don't include config/ and outputs/ folders in commits

Additional Resources

Citation

If you use LibrA-Eval in your research, please cite:

@misc{li2024libraleaderboardresponsibleaibalanced,
    title={Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability},
    author={Haonan Li and Xudong Han and [... other authors]},
    year={2024},
    eprint={2412.18551},
    archivePrefix={arXiv}
}

Contact & Contributing

Email: team@librai.tech
GitHub: https://github.com/LibrAIResearch/libra-eval
We welcome contributions! See our contributing guidelines for details.

License

Licensed under the LIBRAI TECHNOLOGIES LTD Software License Agreement. See LICENSE.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
docs		docs
libra_eval		libra_eval
scripts		scripts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirement.txt		requirement.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LibrA-Eval

Introduction

Quick Start

Detailed Usage Guide

Command Line Options

Example with Multiple Models

Output Structure

Analyzing Results

Important Notes

Additional Resources

Citation

Contact & Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 18

Uh oh!

Languages

License

LibrAIResearch/libra-eval

Folders and files

Latest commit

History

Repository files navigation

LibrA-Eval

Introduction

Quick Start

Detailed Usage Guide

Command Line Options

Example with Multiple Models

Output Structure

Analyzing Results

Important Notes

Additional Resources

Citation

Contact & Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 18

Uh oh!

Languages

Packages