The current cascading strategy routes requests to cheaper models first and escalates only on failure. This creates a structural bias where cheaper models dominate traffic and premium models are under-explored.
As a result, the bandit receives biased feedback and cannot accurately estimate the value of premium models.