Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 15, 2025

Parser selection threshold (500KB) was hardcoded in src/rag.py, preventing customization for different use cases.

Changes

  • src/config.py: Add ParserConfig class with size_threshold_kb field (default: 500)
  • src/rag.py: Replace hardcoded 500 * 1024 with config.parser.size_threshold_kb * 1024
  • env.example: Document new PARSER_SIZE_THRESHOLD_KB variable

Usage

# Set threshold to 1MB for larger documents
PARSER_SIZE_THRESHOLD_KB=1024

Files below threshold use DeepSeek-OCR; files above use MinerU. Default behavior unchanged.

Original prompt

This section details on the original issue you should resolve

<issue_title>File size threshold hardcoded - rigid parser selection strategy</issue_title>
<issue_description>## 问题描述
文件大小阈值硬编码导致 Parser 选择策略不灵活。

受影响的文件

  • src/rag.py 行 185, 220

硬编码值

  • if file_size < 500 * 1024 (500KB)

问题

不同业务场景可能需要不同的阈值。

解决方案

应该配置:PARSER_SIZE_THRESHOLD_KB 环境变量</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits December 15, 2025 09:41
Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>
Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix hardcoded file size threshold for parser selection feat: Make parser size threshold configurable via PARSER_SIZE_THRESHOLD_KB Dec 15, 2025
Copilot AI requested a review from BukeLy December 15, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

File size threshold hardcoded - rigid parser selection strategy

2 participants