Context
Tokenomics reliably measures cost and latency, but quality evaluation
is currently heuristic and informal.
As routing and compression strategies evolve, we need a way to ensure
optimizations do not silently degrade output quality.
Open problem
- Define repeatable evaluation benchmarks
- Introduce quality scoring or acceptance thresholds
- Add regression tests that fail when quality drops beyond tolerance
Notes
This does not require perfect "ground truth" quality metrics —
even coarse guardrails would be a meaningful improvement.