-
Notifications
You must be signed in to change notification settings - Fork 157
Description
Hi AssetOpsBench team,
While reviewing the benchmark and leaderboard, I noticed that results are currently reported across multiple dimensions (six at present). I’d like to propose integrating an aggregation method we recently introduced that provides a more holistic comparison across multiple dimensions.
We propose AGI_AUC, an aggregation technique that combines multiple evaluation dimensions into a single indicator intended to measure the general intelligence of a system across the considered benchmarks. Unlike a simple arithmetic mean, AGI_AUC is designed to avoid overstating performance and to better expose weaknesses across dimensions.
Details:
Technical formulation: Equations 2 and 3 in our paper
https://arxiv.org/pdf/2510.20784
Reference implementation:
https://github.com/fouratifares/coherence-agi
The method supports any number of dimensions and has been applied across several benchmarks, including 17 used by the Gemini team.
I believe AGI_AUC could be a useful optional aggregate metric for AssetOpsBench (e.g., alongside per-dimension scores).
Looking forward to feedback and discussion.
Best,
Fares