Cache hits bypass the learning loop

Cache hits currently bypass routing, bandit decision-making, and reward attribution entirely. From the learning system’s perspective, rewards appear without an associated action.

This breaks the feedback loop between decisions and outcomes.