-
-
Notifications
You must be signed in to change notification settings - Fork 285
feat(cli): add stats command to display graph statistics #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds a new `graph-code stats` command that displays statistics about the indexed codebase graph including: - Node counts by type (Function, Class, Module, etc.) - Relationship counts by type (CALLS, CONTAINS, INHERITS, etc.) - Total node and relationship counts - Formatted output using Rich tables Fixes vitali87#248
Summary of ChangesHello @AndyBodnar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new command-line interface feature that allows users to quickly gain insights into the structure and content of their indexed codebase knowledge graph. By providing a clear, tabulated summary of node and relationship statistics, it enhances the usability of the tool, enabling better understanding and debugging of the graph data. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new stats command to the CLI, which is a great addition for visibility into the knowledge graph. The implementation is straightforward, but I've identified a few areas for improvement concerning robustness, adherence to API contracts, and code structure. My main suggestions involve making the Cypher queries more robust, using public methods instead of internal ones, removing an unused CLI parameter, and considering refactoring duplicated code for better maintainability, while acknowledging the trade-off with readability. These changes will make the new command more reliable and easier to maintain in the future.
Greptile OverviewGreptile SummaryThis PR adds a new What Changed
Integration with CodebaseThe implementation follows the existing CLI command pattern:
Issues IdentifiedCritical Issues (Must Fix)
Style/Best Practice Issues (Should Fix)
Positive Aspects
Confidence Score: 2/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant CLI as cli.py:stats()
participant Settings as config.settings
participant Main as main.py
participant Ingestor as MemgraphIngestor
participant Memgraph as Memgraph DB
participant Console as Rich Console
User->>CLI: graph-code stats [--batch-size N]
CLI->>Console: Print "Connecting to Memgraph..."
CLI->>Settings: resolve_batch_size(batch_size)
Settings-->>CLI: effective_batch_size
CLI->>Main: connect_memgraph(effective_batch_size)
Main->>Ingestor: __init__(host, port, batch_size)
Main->>Ingestor: __enter__()
Ingestor->>Memgraph: connect()
Memgraph-->>Ingestor: connection established
Ingestor-->>Main: ingestor instance
Main-->>CLI: ingestor
CLI->>Ingestor: _execute_query("MATCH (n) RETURN labels(n)[0]...")
Ingestor->>Memgraph: Execute Cypher query (node counts)
Memgraph-->>Ingestor: node statistics results
Ingestor-->>CLI: node_results
CLI->>Ingestor: _execute_query("MATCH ()-[r]->() RETURN type(r)...")
Ingestor->>Memgraph: Execute Cypher query (relationship counts)
Memgraph-->>Ingestor: relationship statistics results
Ingestor-->>CLI: rel_results
CLI->>CLI: Calculate total_nodes and total_rels
CLI->>Console: Print node statistics table
CLI->>Console: Print relationship statistics table
CLI->>Ingestor: __exit__()
Ingestor->>Memgraph: close connection
alt Success
CLI->>User: Display statistics tables
else Exception
CLI->>Console: Print error message
CLI->>User: Exit with code 1
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, 6 comments
|
@AndyBodnar please clear all comments flagged by the bots. I will review only after they are all resolved |
- Move Cypher queries to cypher_queries.py as constants - Use public fetch_all() instead of private _execute_query() - Fix multi-label handling by returning all labels and joining with ':'
|
I've addressed all the bot review comments in the latest commit:
Ready for review when you have time! |
|
All the bot comments have been resolved. Ready for your review whenever you have time. |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a useful stats command to display graph statistics. The implementation is straightforward and uses rich to present the data nicely. I've added a few suggestions to improve code clarity in the data processing loops and to make the error handling more robust. Overall, a good addition to the CLI.
Removed the batch_size parameter since stats only reads from the database - write buffering doesn't apply here. Also removed an inline comment that was flagged for violating the project's comment policy.
|
Cleaned up the remaining items:
The earlier commit already moved the Cypher queries to cypher_queries.py and switched to fetch_all() instead of the private method. Should be good to go now. |
|
Thanks for catching those! Yeah the broad Exception catch is lazy on my part, I'll narrow it down to the specific mgclient errors. And good call on the redundant int/string conversions, will clean those up too. Pushing the fixes shortly. |
|
Thank you @AndyBodnar . Excellent job! Let me run and verify the functionality and if all good, will approve and merge 🙌
|
|
did everything clean up efficiently ? |
Code Review: Type Safety, Linting, and Test Coverage RequiredI've tested this PR thoroughly and found several issues that need to be addressed before merging.
1. Import Ordering (Linting Error)File: The imports are not properly sorted. Running Fix: Run 2. Type Safety Issues with
|
| Check | Status |
|---|---|
uv run ruff check |
✅ |
uv run ty check |
✅ |
uv run pytest |
✅ 2781 passed |
Manual Testing
| Scenario | Result |
|---|---|
| Empty database | ✅ Shows tables with 0 totals |
| Populated database | ✅ Correct node/relationship counts displayed |
| Memgraph unavailable | ✅ Graceful error, exit code 1 |
--help flag |
✅ Proper help text |
Summary
- Run pre-commit hooks — This is mandatory and would catch these issues automatically
- Fix import ordering with
ruff check --fix - Add
_get_count()and_get_labels()helper functions with proper type narrowing - Import
ResultRowfrom.types_defs - Update the stats function to use the helper functions
- Add unit tests for the helper functions and stats command behavior
The code works at runtime because Memgraph returns the correct types, but it doesn't pass our static type checks since ResultRow is typed broadly. Please address the items above to satisfy linting and type checking requirements. If you have any questions or comments, let me know.
|
@AndyBodnar see above regarding my test report. |
Summary
Adds a new
graph-code statscommand that displays statistics about the indexed codebase knowledge graph.Changes
statscommand tocli.pySTATStoCLICommandNameenum incli_help.pyCMD_STATShelp textExample Output
Test plan
graph-code statswith Memgraph connectedFixes #248