Conversation
Most AIs will still opt for `rg` and other tools alongside largefile. Those usecases can directly be solved on largefile. Adds capabilities: - regex - case_sensitive option - invert - count_only
Provided detatiled long_lines stats instead of just a boolean for if long lines exist. Also detect if the file is a binary and exit early
`edit_content`'s current API is clunky. There is no need for a dual-mode when all changes can be specified in an array. This updates `edit_content` to use an array to list all changes (1 item for a single change)
Currently the target parameter is overloaded to do different things based on the request being made. This is error-prone and just bad design. For simplicity's sake, `target` is now replaced with - `offset: int = 1` - starting line number (1-indexed) - `limit: int = 100` - maximum lines to return - `pattern: str | None = None` - optional search pattern Also adds `head` mode as a companion for tail
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4 +/- ##
==========================================
+ Coverage 82.63% 86.77% +4.13%
==========================================
Files 12 12
Lines 1123 1240 +117
==========================================
+ Hits 928 1076 +148
+ Misses 195 164 -31
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This pull request enhances the largefile MCP tool with improved file parsing and search capabilities to reduce the need for external tools like grep, sed, and ripgrep.
Changes:
- Enhanced
search_contentwith regex, case-sensitivity, invert, and count-only modes - Improved
get_overviewwith binary file detection and detailed long line statistics - Simplified
edit_contentto use uniform array-based changes API (breaking change) - Clarified
read_contentAPI with explicit offset/limit/pattern parameters (breaking change)
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Version bump to 0.1.3 |
| pyproject.toml | Version bump and status upgrade to Production/Stable |
| src/data_models.py | Added LongLineStats dataclass, updated FileOverview with binary detection fields |
| src/file_access.py | Added binary file detection, long line stats, and read_head function |
| src/search_engine.py | Added regex matching, case sensitivity, and invert support |
| src/tools.py | Updated all tools with new features and breaking API changes |
| src/mcp_schemas.py | Updated read_content schema with new parameters |
| docs/API.md | Comprehensive documentation updates for all new features |
| README.md | Updated examples and feature descriptions |
| tests/unit/*.py | Extensive test coverage for new features |
| tests/integration/*.py | Updated integration tests for new API |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/tools.py
Outdated
| "lines_returned": len(content_lines), | ||
| "total_lines": total_lines, | ||
| "mode": mode, | ||
| "truncated": end_line < start_line + limit - 1 and end_line < total_lines, |
There was a problem hiding this comment.
The truncation logic is incorrect. If end_line < total_lines, it means there's more content in the file beyond what was returned, so truncated should be True. The current condition end_line < start_line + limit - 1 and end_line < total_lines will almost never be true given that end_line = min(total_lines, start_line + limit - 1). For example, if start_line=1, limit=100, total_lines=150, then end_line=100, and the condition evaluates to 100 < 100 and 100 < 150 = False, incorrectly indicating no truncation. The fix should be: "truncated": end_line < total_lines
| "truncated": end_line < start_line + limit - 1 and end_line < total_lines, | |
| "truncated": end_line < total_lines, |
tests/unit/test_search_engine.py
Outdated
| """count_only=True returns just the count.""" | ||
| result = search_content(temp_file, "error", fuzzy=False, count_only=True) | ||
| assert "count" in result | ||
| assert result["count"] == 3 # error, error:, Error |
There was a problem hiding this comment.
The comment is misleading. With case_sensitive=True, the search for "error" matches: line 2 "errors", line 3 "if error:", and line 7 "def process_error()". It does NOT match line 4's "Error" (capital E) since the search is case-sensitive. The comment should clarify which occurrences are being counted, e.g., "# errors, error:, process_error" rather than "# error, error:, Error".
| assert result["count"] == 3 # error, error:, Error | |
| assert result["count"] == 3 # errors, error:, process_error |
There was a problem hiding this comment.
an error type on a comment in a test file is rather pedantic, no? "Fixing" neither improves code quality or correctness. This quality of PR review is not good
tests/unit/test_search_engine.py
Outdated
| result = search_content( | ||
| temp_file, "error", fuzzy=False, case_sensitive=False, count_only=True | ||
| ) | ||
| assert result["count"] == 4 # error (2), Error (2) |
There was a problem hiding this comment.
The comment is misleading. With case_sensitive=False, the search for "error" matches 4 occurrences total: line 2 "errors", line 3 "if error:", line 4 "Error occurred" (case-insensitive match), and line 7 "def process_error()". The comment "# error (2), Error (2)" suggests there are 2 exact "error" matches and 2 exact "Error" matches, but this is not accurate. Consider clarifying to "# errors, error:, Error, process_error" or "# 4 total case-insensitive matches".
| assert result["count"] == 4 # error (2), Error (2) | |
| assert result["count"] == 4 # 4 total case-insensitive matches: errors, error:, Error, process_error |
Add file parsing capabilities to
largefileto avoid the need for LLMs to reach for other tools like grep, sed, ripgrep etc.search_contentto support pattern matching & other search features (regex, invert, case sensitivity, count_only)get_overview- now provides detailed stats (not just a boolean long_lines) and checks for binary files to avoid trying to process it.edit_content's dual mode is clunky. this replaces it with a uniform api for making multiple changes using an array for the list of changestargetparameter instead provides clear explicit params to use:offset,limit,pattern