chore(ai-proxy): add support of recording the cache token count for Gemini/OpenAI #14802

git-hulk · 2025-12-09T07:04:58Z

Summary

Currently, ai-proxy only supports the prompt/completions token, not the prompt cache tokens, which might help users observe the cache hit ratio and improve performance.

This PR introduces prompt_cache_tokens to record the cache for Gemini and OpenAI:

kong_ai_llm_tokens_total{ai_provider="gemini",ai_model="gemini-2.5-flash",cache_status="",vector_db="",embeddings_provider="",embeddings_model="",token_type="prompt_cache_tokens",workspace="default"} 1861

Checklist

The Pull Request has tests
A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
There is a user-facing docs PR against https://github.com/Kong/developer.konghq.com - PUT DOCS PR HERE

…emini Currently, ai-proxy only supports the prompt/completions token without the prompt cache tokens, which might be helpful for users to observe the cache hit ratio and improve the performance. This PR introduce `prompt_cache_tokens` to record the cache for Gemini and OpenAI.

Copilot

Pull request overview

This PR adds support for recording prompt cache token counts for Gemini and OpenAI LLM providers, enabling users to observe cache hit ratios and improve performance monitoring.

Key Changes:

Added prompt_cache_tokens field to the metrics schema and analytics serialization
Implemented cache token extraction for Gemini (via cachedContentTokenCount) and OpenAI (via prompt_tokens_details.cache_tokens)
Updated test expectations to include prompt_cache_tokens = 0 in expected chat statistics

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
kong/llm/plugin/observability.lua	Added `llm_prompt_cache_tokens_count` to the metrics schema
kong/llm/drivers/gemini.lua	Introduced `extract_usage` helper function to extract cache tokens from Gemini's `cachedContentTokenCount`
kong/llm/adapters/gemini.lua	Added `prompt_cache_tokens` and `total_tokens` extraction from Gemini response metadata
kong/llm/drivers/shared.lua	Added cache tokens to analytics container and observability metrics
kong/llm/plugin/shared-filters/parse-json-response.lua	Added cache token extraction for both native formats (Gemini) and OpenAI format (via `prompt_tokens_details.cache_tokens`)
kong/llm/plugin/shared-filters/serialize-analytics.lua	Included `prompt_cache_tokens` in serialized usage analytics
kong/plugins/ai-request-transformer/filters/transform-request.lua	Added `llm_prompt_cache_tokens_count` to filter output schema and context
kong/plugins/ai-response-transformer/filters/transform-response.lua	Added `llm_prompt_cache_tokens_count` to filter output schema and context
kong/reports.lua	Added `AI_PROMPT_CACHE_TOKENS_COUNT_KEY` constant and tracking for cache token counts
kong/plugins/prometheus/exporter.lua	Added Prometheus metric collection for `prompt_cache_tokens`
spec/03-plugins/38-ai-proxy/02-openai_integration_spec.lua	Updated expected stats to include `prompt_cache_tokens = 0`
spec/03-plugins/38-ai-proxy/09-streaming_integration_spec.lua	Updated expected stats to include `prompt_cache_tokens = 0`
spec/03-plugins/38-ai-proxy/11-gemini_integration_spec.lua	Updated expected stats to include `prompt_cache_tokens = 0`
spec/03-plugins/39-ai-request-transformer/02-integration_spec.lua	Updated expected stats to include `prompt_cache_tokens = 0`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kong/reports.lua

kong/llm/drivers/shared.lua

kong/llm/plugin/shared-filters/parse-json-response.lua

Copilot AI review requested due to automatic review settings December 9, 2025 07:04

pull-request-size bot added the size/M label Dec 9, 2025

Copilot started reviewing on behalf of git-hulk December 9, 2025 07:05 View session

Copilot AI reviewed Dec 9, 2025

View reviewed changes

kong/reports.lua Outdated Show resolved Hide resolved

kong/llm/drivers/shared.lua Show resolved Hide resolved

kong/llm/plugin/shared-filters/parse-json-response.lua Show resolved Hide resolved

git-hulk added 2 commits December 9, 2025 15:15

Improve the name style

7dc8829

Merge branch 'master' into feature/add-prompt-cache-tokens

216a837

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(ai-proxy): add support of recording the cache token count for Gemini/OpenAI #14802

chore(ai-proxy): add support of recording the cache token count for Gemini/OpenAI #14802

git-hulk commented Dec 9, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chore(ai-proxy): add support of recording the cache token count for Gemini/OpenAI #14802

Are you sure you want to change the base?

chore(ai-proxy): add support of recording the cache token count for Gemini/OpenAI #14802

Conversation

git-hulk commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

git-hulk commented Dec 9, 2025 •

edited

Loading