Skip to content

Comments

feat: PDF text extraction, content size guardrails, and dynamic file offloading#4

Open
coctostan wants to merge 12 commits intomainfrom
feat/pdf-support-and-size-guardrails
Open

feat: PDF text extraction, content size guardrails, and dynamic file offloading#4
coctostan wants to merge 12 commits intomainfrom
feat/pdf-support-and-size-guardrails

Conversation

@coctostan
Copy link
Owner

@coctostan coctostan commented Feb 19, 2026

Summary

  • PDF text extraction: fetch_content now detects application/pdf, extracts text via pdf-parse, with graceful handling of corrupt/empty/oversized PDFs
  • get_search_content size guardrails: New maxChars parameter (default 30K, hard cap 100K) prevents oversized content from flooding LLM context
  • Dynamic file offloading: tool_result interception writes large results (>30K) to secure temp files, replacing with preview + path so the model can bash grep instead of loading everything into context
  • Documentation: README updated with PDF support, maxChars parameter, content size management section, and changelog

Files Changed

File Change
package.json pdf-parse dependency, version bump to 1.2.0
extract.ts PDF detection and pdf-parse text extraction
extract.test.ts 3 PDF extraction tests
truncation.ts New — content truncation with configurable limit
truncation.test.ts 7 truncation tests
offload.ts New — secure temp file offloading for large content
offload.test.ts 8 offload tests
tool-params.ts normalizeGetSearchContentInput with maxChars validation
index.ts maxChars schema, truncation wiring, tool_result handler, cleanup
README.md PDF support, maxChars docs, content size management, changelog

Test Plan

  • 19 new tests added (3 PDF, 7 truncation, 8 offload, 1 constant)
  • All 110 tests passing
  • Manual: fetch a real PDF URL and confirm text extraction
  • Manual: fetch large page and confirm file offload behavior

@socket-security
Copy link

socket-security bot commented Feb 19, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedpdf-parse@​2.4.59710010088100

View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant