Skip to content

Conversation

@git-hulk
Copy link
Contributor

@git-hulk git-hulk commented Dec 9, 2025

Summary

Currently, ai-proxy only supports the prompt/completions token, not the prompt cache tokens, which might help users observe the cache hit ratio and improve performance.

This PR introduces prompt_cache_tokens to record the cache for Gemini and OpenAI:

kong_ai_llm_tokens_total{ai_provider="gemini",ai_model="gemini-2.5-flash",cache_status="",vector_db="",embeddings_provider="",embeddings_model="",token_type="prompt_cache_tokens",workspace="default"} 1861

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/developer.konghq.com - PUT DOCS PR HERE

…emini

Currently, ai-proxy only supports the prompt/completions token without
the prompt cache tokens, which might be helpful for users to observe the
cache hit ratio and improve the performance.

This PR introduce `prompt_cache_tokens` to record the cache for Gemini
and OpenAI.
Copilot AI review requested due to automatic review settings December 9, 2025 07:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for recording prompt cache token counts for Gemini and OpenAI LLM providers, enabling users to observe cache hit ratios and improve performance monitoring.

Key Changes:

  • Added prompt_cache_tokens field to the metrics schema and analytics serialization
  • Implemented cache token extraction for Gemini (via cachedContentTokenCount) and OpenAI (via prompt_tokens_details.cache_tokens)
  • Updated test expectations to include prompt_cache_tokens = 0 in expected chat statistics

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
kong/llm/plugin/observability.lua Added llm_prompt_cache_tokens_count to the metrics schema
kong/llm/drivers/gemini.lua Introduced extract_usage helper function to extract cache tokens from Gemini's cachedContentTokenCount
kong/llm/adapters/gemini.lua Added prompt_cache_tokens and total_tokens extraction from Gemini response metadata
kong/llm/drivers/shared.lua Added cache tokens to analytics container and observability metrics
kong/llm/plugin/shared-filters/parse-json-response.lua Added cache token extraction for both native formats (Gemini) and OpenAI format (via prompt_tokens_details.cache_tokens)
kong/llm/plugin/shared-filters/serialize-analytics.lua Included prompt_cache_tokens in serialized usage analytics
kong/plugins/ai-request-transformer/filters/transform-request.lua Added llm_prompt_cache_tokens_count to filter output schema and context
kong/plugins/ai-response-transformer/filters/transform-response.lua Added llm_prompt_cache_tokens_count to filter output schema and context
kong/reports.lua Added AI_PROMPT_CACHE_TOKENS_COUNT_KEY constant and tracking for cache token counts
kong/plugins/prometheus/exporter.lua Added Prometheus metric collection for prompt_cache_tokens
spec/03-plugins/38-ai-proxy/02-openai_integration_spec.lua Updated expected stats to include prompt_cache_tokens = 0
spec/03-plugins/38-ai-proxy/09-streaming_integration_spec.lua Updated expected stats to include prompt_cache_tokens = 0
spec/03-plugins/38-ai-proxy/11-gemini_integration_spec.lua Updated expected stats to include prompt_cache_tokens = 0
spec/03-plugins/39-ai-request-transformer/02-integration_spec.lua Updated expected stats to include prompt_cache_tokens = 0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant