feat: PDF text extraction, content size guardrails, and dynamic file offloading by coctostan · Pull Request #4 · coctostan/pi-web-tools

coctostan · 2026-02-19T16:23:29Z

Summary

PDF text extraction: fetch_content now detects application/pdf, extracts text via pdf-parse, with graceful handling of corrupt/empty/oversized PDFs
get_search_content size guardrails: New maxChars parameter (default 30K, hard cap 100K) prevents oversized content from flooding LLM context
Dynamic file offloading: tool_result interception writes large results (>30K) to secure temp files, replacing with preview + path so the model can bash grep instead of loading everything into context
Documentation: README updated with PDF support, maxChars parameter, content size management section, and changelog

Files Changed

File	Change
`package.json`	`pdf-parse` dependency, version bump to 1.2.0
`extract.ts`	PDF detection and `pdf-parse` text extraction
`extract.test.ts`	3 PDF extraction tests
`truncation.ts`	New — content truncation with configurable limit
`truncation.test.ts`	7 truncation tests
`offload.ts`	New — secure temp file offloading for large content
`offload.test.ts`	8 offload tests
`tool-params.ts`	`normalizeGetSearchContentInput` with `maxChars` validation
`index.ts`	`maxChars` schema, truncation wiring, `tool_result` handler, cleanup
`README.md`	PDF support, `maxChars` docs, content size management, changelog

Test Plan

19 new tests added (3 PDF, 7 truncation, 8 offload, 1 constant)
All 110 tests passing
Manual: fetch a real PDF URL and confirm text extraction
Manual: fetch large page and confirm file offload behavior

… flag

socket-security · 2026-02-19T16:24:00Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	pdf-parse@2.4.5

View full report

…changelog

coctostan added 10 commits February 19, 2026 10:07

docs: design for PDF support and content size guardrails

62ada51

docs: add dynamic filtering design via tool_result interception

9566388

[pi] Design updated and committed. ✅

863be04

chore: add pdf-parse dependency

550bfe0

feat: add PDF text extraction via pdf-parse

7a780d5

feat: add truncation module for content size guardrails

7490051

feat: add maxChars parameter to get_search_content with size guardrails

2b65958

feat: add file offload module for large content

596b925

feat: wire up tool_result file offloading and cleanup

a5cf0ba

fix: harden temp file creation with secure dir, mode 0o600, exclusive…

d3b022f

… flag

coctostan added 2 commits February 19, 2026 11:28

chore: bump version to 1.2.0

cb59f66

docs: update README with PDF support, maxChars, file offloading, and …

fe7300d

…changelog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: PDF text extraction, content size guardrails, and dynamic file offloading#4

feat: PDF text extraction, content size guardrails, and dynamic file offloading#4
coctostan wants to merge 12 commits intomainfrom
feat/pdf-support-and-size-guardrails

coctostan commented Feb 19, 2026 •

edited

Loading

Uh oh!

socket-security bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

coctostan commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files Changed

Test Plan

Uh oh!

socket-security bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coctostan commented Feb 19, 2026 •

edited

Loading

socket-security bot commented Feb 19, 2026 •

edited

Loading