Skip to content

Conversation

@nazq
Copy link
Contributor

@nazq nazq commented Dec 31, 2025

Summary

  • Add ChatMetrics struct for tracking token usage and timing
  • Add Tracked<S> stream wrapper for metrics-aware streaming
  • Add MetricsProvider trait with chat_with_metrics() and chat_stream_with_metrics()
  • Add .enable_metrics(bool) builder method
  • Integrate with existing StreamChunk::Usage variant for token tracking

Motivation

I realized I was building most of these features into multiple App engines, so thought I'd push down and share. Users need visibility into LLM request performance and costs. This feature provides:

  • Token usage tracking (input/output/total)
  • Time to first token measurement
  • Total request duration
  • Works with both streaming and non-streaming APIs

Approach

Opt-in design: Metrics collection is disabled by default. Users must explicitly enable it via .enable_metrics(true) on the builder. This ensures zero overhead for users who don't need metrics.

Non-breaking: All new APIs are additive. Existing chat() and chat_stream() methods work unchanged.

Usage

Enable Metrics

let llm = LLMBuilder::new()
    .backend(LLMBackend::OpenAI)
    .api_key(&key)
    .enable_metrics(true)
    .build()?;

Non-Streaming

use llm::metrics::MetricsProvider;

let messages = vec![ChatMessage::user().content("Hello").build()];
let (response, metrics) = llm.chat_with_metrics(&messages).await?;

if let Some(m) = metrics {
    println!("Input tokens: {}", m.input_tokens.unwrap_or(0));
    println!("Output tokens: {}", m.output_tokens.unwrap_or(0));
    println!("Time to first token: {:?}", m.time_to_first_token);
    println!("Total duration: {:?}", m.total_duration);
}

Streaming

use llm::chat::Tracked;
use futures::StreamExt;

let tracked: Tracked<_> = llm.chat_stream_with_metrics(&messages).await?;

futures::pin_mut!(tracked);
while let Some(chunk) = tracked.next().await {
    match chunk? {
        StreamChunk::Delta(text) => print!("{}", text),
        StreamChunk::Usage { input_tokens, output_tokens, .. } => {
            println!("Tokens: {}/{}", input_tokens, output_tokens);
        }
        _ => {}
    }
}

let metrics = tracked.metrics();
println!("Total duration: {:?}", metrics.total_duration);

Changes

New Files

File Purpose
src/metrics.rs ChatMetrics struct and MetricsProvider trait
src/chat/tracked.rs Tracked<S> stream wrapper

Modified Files

File Changes
src/lib.rs Export metrics module
src/chat/mod.rs Export Tracked
src/builder.rs Add enable_metrics field and method

Test Plan

  • cargo check passes
  • cargo clippy passes
  • cargo test passes
  • cargo build --release passes

Dependencies

This feature integrates with PR #96 (stream_usage) which adds StreamChunk::Usage variant. The metrics collection leverages this to capture token counts during streaming.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 219 to 222
// Record first chunk time
if this.first_chunk_time.is_none() {
*this.first_chunk_time = Some(Instant::now());
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record TTFT only when a text token is seen

Here first_chunk_time is set on the very first streamed item, even if that item is a tool-use event or an empty/usage-only chunk. For tool-only responses or streams that send ToolUseStart/Done before any text, time_to_first_token becomes Some(...) even though no token was emitted, which makes the metric inaccurate. Consider setting first_chunk_time only when extract_text() returns Some (and ideally non-empty), so TTFT reflects the first actual text token.

Useful? React with 👍 / 👎.

@nazq nazq force-pushed the feat/metrics-collection branch from 4073506 to 871b0b2 Compare December 31, 2025 17:10
@nazq nazq force-pushed the feat/metrics-collection branch from 871b0b2 to b0b056b Compare December 31, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant