Context
Exact and semantic caching provide significant cost savings, but
LLM outputs are probabilistic and context-sensitive.
Naive caching can lead to stale or incorrect reuse.
Open problem
- TTL vs trust-decay strategies
- Semantic cache threshold tuning
- When cached responses should be invalidated or revalidated
This is a fundamental challenge in LLM systems, not a simple fix.