Skip to content

Conversation

@marklabz
Copy link

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Original description:

This pull request proposes merging the feature-branch from @RaZzzyz/LeanRAG into the main branch of marklabz/LeanRAG. The branch includes major improvements to the LeanRAG framework with Docker support and enhanced functionality.

Commit message summary:

🚀 New Features:

  • Added Docker support with MySQL container setup (Dockerfile.mysql, docker-compose.yml)
  • Comprehensive MySQL Docker setup documentation (MYSQL_DOCKER_README.md)
  • New configuration templates and examples for CommonKG
  • Test entities configuration for small-scale testing
  • Comprehensive logging system for knowledge graph creation
  • New dataset processing capabilities with mix_chunk support

🔧 Improvements:

  • Enhanced database utilities with better MySQL connection handling
  • Improved code formatting and structure across multiple modules
  • Better error handling and database name validation
  • Enhanced JSONL file format support in chunk processing
  • Improved search functionality with optimized node retrieval
  • Better path handling for cross-platform compatibility

📊 Data Processing:

  • New entity and relation processing capabilities
  • Enhanced triple extraction and deduplication
  • Improved community detection and clustering
  • Better text unit aggregation with configurable parameters
  • Enhanced vector search integration with Milvus

🛠️ Technical Enhancements:

  • Code cleanup and formatting improvements
  • Better separation of concerns in database operations
  • Enhanced configuration management
  • Improved file I/O operations
  • Better memory management in large dataset processing

📁 Project Structure:

  • Python package initialization (init.py)
  • Organized configuration files with examples and templates
  • Better separation of test and production configurations
  • Enhanced logging directory structure

This PR makes the LeanRAG framework more robust, scalable, and easier to deploy with Docker support while maintaining backward compatibility.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

marklabz and others added 4 commits August 25, 2025 15:57
…patibility

✨ Features:
- Add automated installation script (install.sh) with Python version selection
- Add full installation script (install-full.sh) for Python 3.10 compatibility
- Support flexible Python versions (3.10, 3.11, 3.12) via command-line args

🔧 Improvements:
- Update nano-graphrag to use latest version with Python 3.11+ support
- Enhanced error handling and user guidance in installation scripts
- Better dependency management with optional packages
- Comprehensive installation verification with import testing

📦 Installation Options:
- ./install.sh          # Default Python 3.11
- ./install.sh 3.12     # Use Python 3.12
- ./install-full.sh     # Python 3.10 for maximum compatibility

🐛 Fixes:
- Resolve nano-graphrag Python compatibility issues
- Improve pyproject.toml dependency management
- Add proper error handling for failed package installations

Co-authored-by: GitHub Copilot <copilot@github.com>
… enhanced functionality

🚀 New Features:
- Add Docker support with MySQL container setup (Dockerfile.mysql, docker-compose.yml)
- Add comprehensive MySQL Docker setup documentation (MYSQL_DOCKER_README.md)
- Add new configuration templates and examples for CommonKG
- Add test entities configuration for small-scale testing
- Add comprehensive logging system for knowledge graph creation
- Add new dataset processing capabilities with mix_chunk support

🔧 Improvements:
- Enhanced database utilities with better MySQL connection handling
- Improved code formatting and structure across multiple modules
- Better error handling and database name validation
- Enhanced JSONL file format support in chunk processing
- Improved search functionality with optimized node retrieval
- Better path handling for cross-platform compatibility

📊 Data Processing:
- Add new entity and relation processing capabilities
- Enhanced triple extraction and deduplication
- Improved community detection and clustering
- Better text unit aggregation with configurable parameters
- Enhanced vector search integration with Milvus

🛠️ Technical Enhancements:
- Code cleanup and formatting improvements across all modules
- Better separation of concerns in database operations
- Enhanced configuration management
- Improved file I/O operations with better format detection
- Better memory management in large dataset processing

📁 Project Structure:
- Add proper Python package initialization (__init__.py)
- Organize configuration files with examples and templates
- Better separation of test and production configurations
- Enhanced logging directory structure

This commit represents a significant enhancement to the LeanRAG framework, making it more robust, scalable, and easier to deploy with Docker support while maintaining backward compatibility.
@marklabz marklabz mentioned this pull request Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant