Propagating token usage across runs

Enhancement Request: Cumulative Token Usage Tracking Across Evaluation Runs
Currently, the Evaluation class's run_dataset method accurately tracks and returns token usage for a single evaluation run. However, when run_pipeline in the 5cs evaluation instrument orchestrates multiple Evaluation instances (one for each prompt_type), the token usage from each individual run is not aggregated. This means that while run_dataset might abort if its max_tokens capacity is exceeded, there's no overall capacity limit enforced across the entire run_pipeline execution. The max_tokens parameter passed to Evaluation is applied independently to each category's run, rather than cumulatively across all categories. This can lead to unexpected higher total token consumption than intended for a complete pipeline execution.

Proposed Enhancement
Modify the run_pipeline function to accept and enforce a cumulative max_tokens limit across all evaluation categories. This would involve the following:

Initialization: Initialize a total_accumulated_usage (e.g., a TokenUsage object or similar structure) at the beginning of the run_pipeline function.

Propagation: Pass this total_accumulated_usage to each Evaluation instance, allowing the Evaluation instance to update it. Alternatively, each run_dataset call could return its accumulated_usage, which run_pipeline then adds to total_accumulated_usage.

Capacity Check: After each evaluator.run_dataset call within the loop, run_pipeline should check if the total_accumulated_usage has exceeded the pipeline's overall max_tokens limit.

Early Termination: If the cumulative limit is exceeded, run_pipeline should log a warning and break out of the loop, similar to how run_dataset aborts individual runs.

Return Value: The run_pipeline function should also return the total_accumulated_usage alongside the aggregated_output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagating token usage across runs #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Propagating token usage across runs #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions