Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 15, 2025

After review, the proposed changes to extract hardcoded token estimation values to configuration have been reverted.

Reason

The rate limiting concurrency is already automatically calculated using existing environment variables:

  • LLM_REQUESTS_PER_MINUTE (default: 800)
  • LLM_TOKENS_PER_MINUTE (default: 40000)

The rate_limiter.py module contains an internal avg_tokens_map that provides token estimation values for each service (LLM: 3500, Embedding: 20000, Rerank: 500, DS_OCR: 3500). These internal values are used to auto-calculate the appropriate concurrency level.

Conclusion

No additional configuration is needed. The existing LLM_REQUESTS_PER_MINUTE and LLM_TOKENS_PER_MINUTE environment variables, combined with the internal token estimation in rate_limiter.py, already provide the correct behavior for rate limiting.

Original prompt

This section details on the original issue you should resolve

<issue_title>Token estimation values hardcoded multiple places - inaccurate rate limiting</issue_title>
<issue_description>## 问题描述
Token 估算值硬编码在多个文件中,导致速率限制不准确。

受影响的文件和位置

  • multi_tenant.py 行 113:estimated_output = 3000(LLM)
  • multi_tenant.py 行 313:estimated_output = 500(VLM)
  • multi_tenant.py 行 339:max_tokens = 500
  • deepseek_ocr_client.py 行 285, 352

问题

  • 无法按不同模型调整估算值
  • 同一个值 "3000" 在多处重复,维护困难
  • 使用小模型时估算过高,并发受限;使用大模型时估算不足

解决方案

应该提取到 .env 配置:

  • LLM_ESTIMATED_OUTPUT_TOKENS
  • VLM_ESTIMATED_OUTPUT_TOKENS 等</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- 在 LLMConfig 中添加 estimated_output_tokens, vlm_estimated_output_tokens, vlm_max_tokens, vlm_image_tokens_estimate
- 在 DeepSeekOCRConfig 中添加 image_tokens_estimate
- 更新 multi_tenant.py 使用配置值替代硬编码
- 更新 deepseek_ocr_client.py 使用配置值替代硬编码
- 更新 env.example 添加新的环境变量说明

Closes #issue

Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix hardcoded token estimation values for accurate rate limiting feat: Extract hardcoded token estimation values to configuration Dec 15, 2025
Copilot AI requested a review from BukeLy December 15, 2025 09:47
Copy link
Owner

@BukeLy BukeLy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整个估算的配置就不应该这么加,按照设计目标来说应该是通过

LLM_REQUESTS_PER_MINUTE=800        # 每分钟最大请求数(默认 800)
LLM_TOKENS_PER_MINUTE=40000        # 每分钟最大令牌数(包含输入+输出,默认 40000)

这两个环境变量来算

env.example Outdated
Comment on lines 34 to 41
# --- LLM Token 估算配置(用于速率限制) ---
# 估算输出 tokens 数量,用于速率限制计算
# 如果估算过高,并发会受限;如果估算不足,可能触发 429 错误
# LLM_ESTIMATED_OUTPUT_TOKENS=3000 # LLM 输出估算(实体提取约 3000 tokens,默认 3000)
# LLM_VLM_ESTIMATED_OUTPUT_TOKENS=500 # VLM 输出估算(图片描述较短,默认 500)
# LLM_VLM_MAX_TOKENS=500 # VLM API 最大输出 tokens(默认 500)
# LLM_VLM_IMAGE_TOKENS_ESTIMATE=200 # VLM 图片输入估算 tokens(默认 200)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个变量值和LLM_REQUESTS_PER_MINUTE=800 # 每分钟最大请求数(默认 800)
LLM_TOKENS_PER_MINUTE=40000 # 每分钟最大令牌数(包含输入+输出,默认 40000)
是不是有冲突?明明是自动计算,为什么还要这些估算值? @copilot @claude

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实有冲突。已在 e2cb6e7 中撤销这些配置项。

rate_limiter.pyavg_tokens_map 已内置各服务的 token 估算值,通过 LLM_REQUESTS_PER_MINUTE / LLM_TOKENS_PER_MINUTE 自动计算并发数,不需要额外暴露这些内部值。

Comment on lines 67 to 72
# Token 估算配置
self.llm_estimated_output_tokens = config.llm.estimated_output_tokens
self.vlm_estimated_output_tokens = config.llm.vlm_estimated_output_tokens
self.vlm_max_tokens = config.llm.vlm_max_tokens
self.vlm_image_tokens_estimate = config.llm.vlm_image_tokens_estimate

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

问题同.env.example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已在 e2cb6e7 中撤销相关修改。

根据用户反馈,这些估算值应该通过 LLM_REQUESTS_PER_MINUTE 和 LLM_TOKENS_PER_MINUTE 自动计算,不需要额外的配置项。

rate_limiter.py 中已有 avg_tokens_map 用于自动计算并发数。

Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 15, 2025

整个估算的配置就不应该这么加,按照设计目标来说应该是通过

LLM_REQUESTS_PER_MINUTE=800        # 每分钟最大请求数(默认 800)...

已在 e2cb6e7 中撤销所有修改。

速率限制的并发数确实应该通过 LLM_REQUESTS_PER_MINUTELLM_TOKENS_PER_MINUTE 自动计算,rate_limiter.py 中的 avg_tokens_map 已包含各服务的内部 token 估算值用于计算。

Copilot AI changed the title feat: Extract hardcoded token estimation values to configuration revert: Token estimation values should use existing auto-calculation Dec 15, 2025
Copilot AI requested a review from BukeLy December 15, 2025 10:05
@BukeLy
Copy link
Owner

BukeLy commented Dec 15, 2025

@copilot 后续如何改进呢?issue是存在的,请问怎么解决?

Copy link
Contributor Author

Copilot AI commented Dec 15, 2025

@copilot 后续如何改进呢?issue是存在的,请问怎么解决?

建议改进方案:将分散的硬编码值统一到 rate_limiter.pyavg_tokens_map 中复用。

当前问题:

  • multi_tenant.py:113 - estimated_output = 3000
  • multi_tenant.py:312-313 - image_tokens = 200, estimated_output = 500
  • deepseek_ocr_client.py:285,352 - 1000 + self.config.max_tokens

改进方案:

  1. rate_limiter.py 导出 avg_tokens_map 或创建 get_avg_tokens(service) 函数
  2. multi_tenant.pydeepseek_ocr_client.py 调用此函数获取估算值
  3. 这样只需维护 rate_limiter.py 一处,无需添加新环境变量

这样可保持与 LLM_REQUESTS_PER_MINUTE / LLM_TOKENS_PER_MINUTE 的自动计算逻辑一致,同时解决硬编码分散的问题。

是否需要我实现这个方案?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Token estimation values hardcoded multiple places - inaccurate rate limiting

2 participants