Skip to content

Comments

largefile mcp tool use improvements#4

Merged
peteretelej merged 7 commits intomainfrom
mcp-improvements
Jan 17, 2026
Merged

largefile mcp tool use improvements#4
peteretelej merged 7 commits intomainfrom
mcp-improvements

Conversation

@peteretelej
Copy link
Owner

Add file parsing capabilities to largefile to avoid the need for LLMs to reach for other tools like grep, sed, ripgrep etc.

  • enhances search_content to support pattern matching & other search features (regex, invert, case sensitivity, count_only)
  • get_overview - now provides detailed stats (not just a boolean long_lines) and checks for binary files to avoid trying to process it.
  • edit_content's dual mode is clunky. this replaces it with a uniform api for making multiple changes using an array for the list of changes
  • stops overloading the target parameter instead provides clear explicit params to use: offset, limit,pattern

Most AIs will still opt for `rg` and other tools alongside largefile.
Those usecases can directly be solved on largefile. Adds capabilities:
- regex
- case_sensitive option
- invert
- count_only
Provided detatiled long_lines stats instead of just a boolean for if
long lines exist. Also detect if the file is a binary and exit early
`edit_content`'s current API is clunky. There is no need for a dual-mode
when all changes can be specified in an array.

This updates `edit_content` to use an array to list all changes (1 item
for a single change)
Currently the target parameter is overloaded to do different things
based on the request being made. This is error-prone and just bad
design. For simplicity's sake, `target` is now replaced with
  - `offset: int = 1` - starting line number (1-indexed)
  - `limit: int = 100` - maximum lines to return
  - `pattern: str | None = None` - optional search pattern

Also adds `head` mode as a companion for tail
@codecov
Copy link

codecov bot commented Jan 17, 2026

Codecov Report

❌ Patch coverage is 94.53552% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.77%. Comparing base (3f94605) to head (f6985bc).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/tools.py 93.67% 5 Missing ⚠️
src/file_access.py 92.15% 4 Missing ⚠️
src/search_engine.py 97.67% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main       #4      +/-   ##
==========================================
+ Coverage   82.63%   86.77%   +4.13%     
==========================================
  Files          12       12              
  Lines        1123     1240     +117     
==========================================
+ Hits          928     1076     +148     
+ Misses        195      164      -31     
Files with missing lines Coverage Δ
src/data_models.py 100.00% <100.00%> (ø)
src/mcp_schemas.py 40.00% <ø> (ø)
src/search_engine.py 93.13% <97.67%> (+1.83%) ⬆️
src/file_access.py 85.36% <92.15%> (+6.97%) ⬆️
src/tools.py 96.66% <93.67%> (+8.55%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request enhances the largefile MCP tool with improved file parsing and search capabilities to reduce the need for external tools like grep, sed, and ripgrep.

Changes:

  • Enhanced search_content with regex, case-sensitivity, invert, and count-only modes
  • Improved get_overview with binary file detection and detailed long line statistics
  • Simplified edit_content to use uniform array-based changes API (breaking change)
  • Clarified read_content API with explicit offset/limit/pattern parameters (breaking change)

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
uv.lock Version bump to 0.1.3
pyproject.toml Version bump and status upgrade to Production/Stable
src/data_models.py Added LongLineStats dataclass, updated FileOverview with binary detection fields
src/file_access.py Added binary file detection, long line stats, and read_head function
src/search_engine.py Added regex matching, case sensitivity, and invert support
src/tools.py Updated all tools with new features and breaking API changes
src/mcp_schemas.py Updated read_content schema with new parameters
docs/API.md Comprehensive documentation updates for all new features
README.md Updated examples and feature descriptions
tests/unit/*.py Extensive test coverage for new features
tests/integration/*.py Updated integration tests for new API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/tools.py Outdated
"lines_returned": len(content_lines),
"total_lines": total_lines,
"mode": mode,
"truncated": end_line < start_line + limit - 1 and end_line < total_lines,
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The truncation logic is incorrect. If end_line < total_lines, it means there's more content in the file beyond what was returned, so truncated should be True. The current condition end_line < start_line + limit - 1 and end_line < total_lines will almost never be true given that end_line = min(total_lines, start_line + limit - 1). For example, if start_line=1, limit=100, total_lines=150, then end_line=100, and the condition evaluates to 100 < 100 and 100 < 150 = False, incorrectly indicating no truncation. The fix should be: "truncated": end_line < total_lines

Suggested change
"truncated": end_line < start_line + limit - 1 and end_line < total_lines,
"truncated": end_line < total_lines,

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

"""count_only=True returns just the count."""
result = search_content(temp_file, "error", fuzzy=False, count_only=True)
assert "count" in result
assert result["count"] == 3 # error, error:, Error
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is misleading. With case_sensitive=True, the search for "error" matches: line 2 "errors", line 3 "if error:", and line 7 "def process_error()". It does NOT match line 4's "Error" (capital E) since the search is case-sensitive. The comment should clarify which occurrences are being counted, e.g., "# errors, error:, process_error" rather than "# error, error:, Error".

Suggested change
assert result["count"] == 3 # error, error:, Error
assert result["count"] == 3 # errors, error:, process_error

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an error type on a comment in a test file is rather pedantic, no? "Fixing" neither improves code quality or correctness. This quality of PR review is not good

result = search_content(
temp_file, "error", fuzzy=False, case_sensitive=False, count_only=True
)
assert result["count"] == 4 # error (2), Error (2)
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is misleading. With case_sensitive=False, the search for "error" matches 4 occurrences total: line 2 "errors", line 3 "if error:", line 4 "Error occurred" (case-insensitive match), and line 7 "def process_error()". The comment "# error (2), Error (2)" suggests there are 2 exact "error" matches and 2 exact "Error" matches, but this is not accurate. Consider clarifying to "# errors, error:, Error, process_error" or "# 4 total case-insensitive matches".

Suggested change
assert result["count"] == 4 # error (2), Error (2)
assert result["count"] == 4 # 4 total case-insensitive matches: errors, error:, Error, process_error

Copilot uses AI. Check for mistakes.
@peteretelej peteretelej merged commit 7e06146 into main Jan 17, 2026
6 checks passed
@peteretelej peteretelej deleted the mcp-improvements branch January 17, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant