CTFBench ( ctfbench.com ) is a benchmark designed for evaluating AI-powered smart contract auditors. This repository contains the methodology, test contracts, and documentation to help developers assess the effectiveness of automated auditors using objective measures.
check folder benchmark_data for the testing contracts and reports.
Set your openrouter api key in .env file.
Run
python tools/bench_synopsis.py ./benchmark_data/synopsis ./benchmark_data/reports/with_errors/savantYou will get a number of synopsis matches in the reports for vulnerable contracts.
Run
python tools/bench_zero_errors.py ./benchmark_data/reports/no_errors/savantYou will get a number of false positives or non-critical advices in the reports for non-vulnerable contracts.
In the current landscape, smart contract security tools struggle with a trade-off between detecting vulnerabilities and minimizing false positives. CTFBench addresses this challenge by introducing two key metrics:
- Vulnerability Detection Rate (VDR): The ratio of correctly identified vulnerabilities.
- Overreporting Index (OI): The proportion of false positives relative to total alerts.
Each test contract in CTFBench contains exactly one predetermined vulnerability, allowing for clear and objective verification of tool performance.
- article.md: A comprehensive description of the CTFBench methodology along with an in-depth analysis of auditor typologies using the VDR–OI space.
- README.md: This file, providing an overview of the benchmark and instructions for getting started.
- assets/: Directory for images and visualizations illustrating the benchmark methodology.
-
Clone the repository:
git clone https://github.com/YOUR_USERNAME/CTFBench.git
-
Dive into the Documentation:
- Read
article.mdfor detailed insights into the benchmark methodology. - Check the
assets/folder for visual aids and diagrams.
- Read
-
Usage:
- Integrate your AI-based smart contract auditing tool with the test contracts provided.
- Evaluate its performance by calculating the VDR and OI metrics.
Contributions are welcome! If you have suggestions, test cases, or improvements in mind, please open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.