-
Notifications
You must be signed in to change notification settings - Fork 0
revert: Token estimation values should use existing auto-calculation #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- 在 LLMConfig 中添加 estimated_output_tokens, vlm_estimated_output_tokens, vlm_max_tokens, vlm_image_tokens_estimate - 在 DeepSeekOCRConfig 中添加 image_tokens_estimate - 更新 multi_tenant.py 使用配置值替代硬编码 - 更新 deepseek_ocr_client.py 使用配置值替代硬编码 - 更新 env.example 添加新的环境变量说明 Closes #issue Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>
BukeLy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整个估算的配置就不应该这么加,按照设计目标来说应该是通过
LLM_REQUESTS_PER_MINUTE=800 # 每分钟最大请求数(默认 800)
LLM_TOKENS_PER_MINUTE=40000 # 每分钟最大令牌数(包含输入+输出,默认 40000)
这两个环境变量来算
env.example
Outdated
| # --- LLM Token 估算配置(用于速率限制) --- | ||
| # 估算输出 tokens 数量,用于速率限制计算 | ||
| # 如果估算过高,并发会受限;如果估算不足,可能触发 429 错误 | ||
| # LLM_ESTIMATED_OUTPUT_TOKENS=3000 # LLM 输出估算(实体提取约 3000 tokens,默认 3000) | ||
| # LLM_VLM_ESTIMATED_OUTPUT_TOKENS=500 # VLM 输出估算(图片描述较短,默认 500) | ||
| # LLM_VLM_MAX_TOKENS=500 # VLM API 最大输出 tokens(默认 500) | ||
| # LLM_VLM_IMAGE_TOKENS_ESTIMATE=200 # VLM 图片输入估算 tokens(默认 200) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确实有冲突。已在 e2cb6e7 中撤销这些配置项。
rate_limiter.py 的 avg_tokens_map 已内置各服务的 token 估算值,通过 LLM_REQUESTS_PER_MINUTE / LLM_TOKENS_PER_MINUTE 自动计算并发数,不需要额外暴露这些内部值。
src/multi_tenant.py
Outdated
| # Token 估算配置 | ||
| self.llm_estimated_output_tokens = config.llm.estimated_output_tokens | ||
| self.vlm_estimated_output_tokens = config.llm.vlm_estimated_output_tokens | ||
| self.vlm_max_tokens = config.llm.vlm_max_tokens | ||
| self.vlm_image_tokens_estimate = config.llm.vlm_image_tokens_estimate | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
问题同.env.example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已在 e2cb6e7 中撤销相关修改。
根据用户反馈,这些估算值应该通过 LLM_REQUESTS_PER_MINUTE 和 LLM_TOKENS_PER_MINUTE 自动计算,不需要额外的配置项。 rate_limiter.py 中已有 avg_tokens_map 用于自动计算并发数。 Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>
已在 e2cb6e7 中撤销所有修改。 速率限制的并发数确实应该通过 |
|
@copilot 后续如何改进呢?issue是存在的,请问怎么解决? |
建议改进方案:将分散的硬编码值统一到 当前问题:
改进方案:
这样可保持与 是否需要我实现这个方案? |
After review, the proposed changes to extract hardcoded token estimation values to configuration have been reverted.
Reason
The rate limiting concurrency is already automatically calculated using existing environment variables:
LLM_REQUESTS_PER_MINUTE(default: 800)LLM_TOKENS_PER_MINUTE(default: 40000)The
rate_limiter.pymodule contains an internalavg_tokens_mapthat provides token estimation values for each service (LLM: 3500, Embedding: 20000, Rerank: 500, DS_OCR: 3500). These internal values are used to auto-calculate the appropriate concurrency level.Conclusion
No additional configuration is needed. The existing
LLM_REQUESTS_PER_MINUTEandLLM_TOKENS_PER_MINUTEenvironment variables, combined with the internal token estimation inrate_limiter.py, already provide the correct behavior for rate limiting.Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.