chore(ai-proxy): add support of recording the cache token count for Gemini/OpenAI #14802
+55
−17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Currently, ai-proxy only supports the prompt/completions token, not the prompt cache tokens, which might help users observe the cache hit ratio and improve performance.
This PR introduces
prompt_cache_tokensto record the cache for Gemini and OpenAI:Checklist
changelog/unreleased/kongorskip-changeloglabel added on PR if changelog is unnecessary. README.md