largefile mcp tool use improvements by peteretelej · Pull Request #4 · peteretelej/largefile

peteretelej · 2026-01-17T09:52:39Z

Add file parsing capabilities to largefile to avoid the need for LLMs to reach for other tools like grep, sed, ripgrep etc.

enhances search_content to support pattern matching & other search features (regex, invert, case sensitivity, count_only)
get_overview - now provides detailed stats (not just a boolean long_lines) and checks for binary files to avoid trying to process it.
edit_content's dual mode is clunky. this replaces it with a uniform api for making multiple changes using an array for the list of changes
stops overloading the target parameter instead provides clear explicit params to use: offset, limit,pattern

Most AIs will still opt for `rg` and other tools alongside largefile. Those usecases can directly be solved on largefile. Adds capabilities: - regex - case_sensitive option - invert - count_only

Provided detatiled long_lines stats instead of just a boolean for if long lines exist. Also detect if the file is a binary and exit early

`edit_content`'s current API is clunky. There is no need for a dual-mode when all changes can be specified in an array. This updates `edit_content` to use an array to list all changes (1 item for a single change)

Currently the target parameter is overloaded to do different things based on the request being made. This is error-prone and just bad design. For simplicity's sake, `target` is now replaced with - `offset: int = 1` - starting line number (1-indexed) - `limit: int = 100` - maximum lines to return - `pattern: str | None = None` - optional search pattern Also adds `head` mode as a companion for tail

codecov · 2026-01-17T09:55:31Z

Codecov Report

❌ Patch coverage is 94.53552% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.77%. Comparing base (3f94605) to head (f6985bc).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/tools.py	93.67%	5 Missing ⚠️
src/file_access.py	92.15%	4 Missing ⚠️
src/search_engine.py	97.67%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main       #4      +/-   ##
==========================================
+ Coverage   82.63%   86.77%   +4.13%     
==========================================
  Files          12       12              
  Lines        1123     1240     +117     
==========================================
+ Hits          928     1076     +148     
+ Misses        195      164      -31

Files with missing lines	Coverage Δ
src/data_models.py	`100.00% <100.00%> (ø)`
src/mcp_schemas.py	`40.00% <ø> (ø)`
src/search_engine.py	`93.13% <97.67%> (+1.83%)`	⬆️
src/file_access.py	`85.36% <92.15%> (+6.97%)`	⬆️
src/tools.py	`96.66% <93.67%> (+8.55%)`	⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This pull request enhances the largefile MCP tool with improved file parsing and search capabilities to reduce the need for external tools like grep, sed, and ripgrep.

Changes:

Enhanced search_content with regex, case-sensitivity, invert, and count-only modes
Improved get_overview with binary file detection and detailed long line statistics
Simplified edit_content to use uniform array-based changes API (breaking change)
Clarified read_content API with explicit offset/limit/pattern parameters (breaking change)

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
uv.lock	Version bump to 0.1.3
pyproject.toml	Version bump and status upgrade to Production/Stable
src/data_models.py	Added LongLineStats dataclass, updated FileOverview with binary detection fields
src/file_access.py	Added binary file detection, long line stats, and read_head function
src/search_engine.py	Added regex matching, case sensitivity, and invert support
src/tools.py	Updated all tools with new features and breaking API changes
src/mcp_schemas.py	Updated read_content schema with new parameters
docs/API.md	Comprehensive documentation updates for all new features
README.md	Updated examples and feature descriptions
tests/unit/*.py	Extensive test coverage for new features
tests/integration/*.py	Updated integration tests for new API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-17T09:57:35Z

src/tools.py

+        "lines_returned": len(content_lines),
+        "total_lines": total_lines,
+        "mode": mode,
+        "truncated": end_line < start_line + limit - 1 and end_line < total_lines,


The truncation logic is incorrect. If end_line < total_lines, it means there's more content in the file beyond what was returned, so truncated should be True. The current condition end_line < start_line + limit - 1 and end_line < total_lines will almost never be true given that end_line = min(total_lines, start_line + limit - 1). For example, if start_line=1, limit=100, total_lines=150, then end_line=100, and the condition evaluates to 100 < 100 and 100 < 150 = False, incorrectly indicating no truncation. The fix should be: "truncated": end_line < total_lines

Suggested change

"truncated": end_line < start_line + limit - 1 and end_line < total_lines,

"truncated": end_line < total_lines,

nice catch!

Copilot · 2026-01-17T09:57:35Z

tests/unit/test_search_engine.py

+        """count_only=True returns just the count."""
+        result = search_content(temp_file, "error", fuzzy=False, count_only=True)
+        assert "count" in result
+        assert result["count"] == 3  # error, error:, Error


The comment is misleading. With case_sensitive=True, the search for "error" matches: line 2 "errors", line 3 "if error:", and line 7 "def process_error()". It does NOT match line 4's "Error" (capital E) since the search is case-sensitive. The comment should clarify which occurrences are being counted, e.g., "# errors, error:, process_error" rather than "# error, error:, Error".

Suggested change

assert result["count"] == 3 # error, error:, Error

assert result["count"] == 3 # errors, error:, process_error

an error type on a comment in a test file is rather pedantic, no? "Fixing" neither improves code quality or correctness. This quality of PR review is not good

Copilot · 2026-01-17T09:57:36Z

tests/unit/test_search_engine.py

+        result = search_content(
+            temp_file, "error", fuzzy=False, case_sensitive=False, count_only=True
+        )
+        assert result["count"] == 4  # error (2), Error (2)


The comment is misleading. With case_sensitive=False, the search for "error" matches 4 occurrences total: line 2 "errors", line 3 "if error:", line 4 "Error occurred" (case-insensitive match), and line 7 "def process_error()". The comment "# error (2), Error (2)" suggests there are 2 exact "error" matches and 2 exact "Error" matches, but this is not accurate. Consider clarifying to "# errors, error:, Error, process_error" or "# 4 total case-insensitive matches".

Suggested change

assert result["count"] == 4 # error (2), Error (2)

assert result["count"] == 4 # 4 total case-insensitive matches: errors, error:, Error, process_error

src/file_access.py

peteretelej added 6 commits January 17, 2026 08:12

add search_content capabilities

d2d991d

Most AIs will still opt for `rg` and other tools alongside largefile. Those usecases can directly be solved on largefile. Adds capabilities: - regex - case_sensitive option - invert - count_only

improve get_overview stats output

199818f

Provided detatiled long_lines stats instead of just a boolean for if long lines exist. Also detect if the file is a binary and exit early

Simplify edit_content API

a56e43b

`edit_content`'s current API is clunky. There is no need for a dual-mode when all changes can be specified in an array. This updates `edit_content` to use an array to list all changes (1 item for a single change)

docs: update documentation

2b6d190

Add tests for new tools apis

5e7e8da

peteretelej added the enhancement New feature or request label Jan 17, 2026

peteretelej requested a review from Copilot January 17, 2026 09:52

Copilot started reviewing on behalf of peteretelej January 17, 2026 09:53 View session

Copilot AI reviewed Jan 17, 2026

View reviewed changes

dedup codecov config

f6985bc

peteretelej merged commit 7e06146 into main Jan 17, 2026
6 checks passed

peteretelej deleted the mcp-improvements branch January 17, 2026 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

largefile mcp tool use improvements#4

largefile mcp tool use improvements#4
peteretelej merged 7 commits intomainfrom
mcp-improvements

peteretelej commented Jan 17, 2026

Uh oh!

codecov bot commented Jan 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 17, 2026

Uh oh!

peteretelej Jan 17, 2026

Uh oh!

Copilot AI Jan 17, 2026

Uh oh!

peteretelej Jan 17, 2026

Uh oh!

Copilot AI Jan 17, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	"truncated": end_line < start_line + limit - 1 and end_line < total_lines,
	"truncated": end_line < total_lines,

	assert result["count"] == 3 # error, error:, Error
	assert result["count"] == 3 # errors, error:, process_error

	assert result["count"] == 4 # error (2), Error (2)
	assert result["count"] == 4 # 4 total case-insensitive matches: errors, error:, Error, process_error

Comments

Conversation

peteretelej commented Jan 17, 2026

Uh oh!

codecov bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

peteretelej Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

peteretelej Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Jan 17, 2026 •

edited

Loading