Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ tempfile = "3.8"

[[bin]]
name = "mlclr"
path = "src/vector_molecular.rs"
path = "src/main.rs"

[[bin]]
name = "check_versions"
Expand Down
168 changes: 168 additions & 0 deletions src/MODULE_TREE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Molecular MCP - Module Tree Structure

## Overview
This document provides a complete overview of the Molecular MCP source code structure after refactoring for LLM-friendliness (all files under 400 lines).

```
src/
├── lib.rs (37 lines) - Main library entry point
├── main.rs (267 lines) - Binary entry point (mlclr)
├── check_versions.rs (34 lines) - Version compatibility checking
├── MODULE_TREE.md (THIS FILE) - Architecture documentation
├── bin/
│ ├── duck.rs (334 lines) - Duck session naming utility
│ └── test_embeddings.rs (31 lines) - Embedding system tests
├── ingestion/ - Input systems and data ingestion
│ ├── mod.rs (11 lines) - Module re-exports
│ ├── fifo_ingestion.rs (337 lines) - FIFO-based event ingestion
│ └── README.md - Module documentation
├── models/ - Core data structures and schemas
│ ├── mod.rs (18 lines) - Module re-exports
│ ├── embeddings.rs (373 lines) - ML embeddings functionality
│ ├── config.rs (421 lines) - Configuration structures
│ ├── schema/ - Vector-native event schema
│ │ ├── mod.rs (67 lines) - Schema module coordination
│ │ ├── events.rs (200 lines) - Event types and enumerations
│ │ ├── content.rs (175 lines) - Content and context structures
│ │ └── queries.rs (200 lines) - Search queries and utilities
│ └── README.md - Module documentation
├── processing/ - Event processing and analysis
│ ├── mod.rs (16 lines) - Module re-exports
│ ├── ring_buffer.rs (592 lines) - Multi-tier event buffering [NEEDS SPLIT]
│ ├── semantic_classifier.rs (351 lines) - Event classification
│ ├── compression_pipeline.rs (402 lines) - Event compression [NEEDS SPLIT]
│ └── README.md - Module documentation
├── server/ - MCP server implementations
│ ├── mod.rs (20 lines) - Module re-exports
│ ├── core.rs (127 lines) - VectorMolecularSystem core
│ ├── tcp.rs (131 lines) - TCP server implementation
│ ├── stdio.rs (92 lines) - Stdio server implementation
│ ├── tools.rs (202 lines) - MCP tool implementations
│ ├── fifo_consumers.rs (332 lines) - FIFO event consumers
│ ├── json_rpc.rs (420 lines) - JSON-RPC processing [NEEDS SPLIT]
│ └── README.md - Module documentation
└── storage/ - Data persistence abstractions
├── mod.rs (14 lines) - Module re-exports
├── cache.rs (66 lines) - Event caching layer
├── lancedb/ - LanceDB vector database integration
│ ├── mod.rs (84 lines) - LanceDB module coordination
│ ├── connection.rs (95 lines) - Database connection & tables
│ ├── operations.rs (200 lines) - Core CRUD operations
│ ├── search.rs (195 lines) - Semantic search & queries
│ └── statistics.rs (150 lines) - Analytics & performance metrics
└── README.md - Module documentation
```

## Module Responsibilities

### 🎯 **Core Modules**

**`lib.rs`** - Library coordination and re-exports
- Provides unified API for the entire molecular library
- Re-exports key types from all modules
- Minimal dependency coordination

**`main.rs`** - Application entry point (mlclr binary)
- Command-line interface for the molecular MCP server
- Server startup, configuration loading, signal handling
- Multi-transport coordination (TCP, stdio, FIFO)

### 📥 **Ingestion Module** (`ingestion/`)

**Purpose**: Handle various input mechanisms for molecular events

- `fifo_ingestion.rs` - FIFO-based real-time event streaming
- Future: HTTP endpoints, file watchers, webhooks

**Why isolated**: Input mechanisms are independent and may grow significantly as we add more protocols.

### 🏗️ **Models Module** (`models/`)

**Purpose**: Define all data structures, configurations, and schemas

- `config.rs` - System configuration and settings
- `embeddings.rs` - ML embedding models and utilities
- `schema/` - Vector-native event schema (decomposed for LLM-friendliness)

**Schema Decomposition Rationale**:
- **`events.rs`** - Core event types (200 lines): Event definitions were complex enough to warrant focused attention
- **`content.rs`** - Content structures (175 lines): Event payloads have rich structure deserving separate module
- **`queries.rs`** - Search utilities (200 lines): Query handling and timestamp utilities are distinct concerns

### ⚙️ **Processing Module** (`processing/`)

**Purpose**: Event processing, analysis, and transformation pipeline

- `ring_buffer.rs` - Multi-tier buffering with flood protection 🔄 *[NEEDS SPLIT]*
- `semantic_classifier.rs` - AI-powered event categorization
- `compression_pipeline.rs` - Event compression and archival 🔄 *[NEEDS SPLIT]*

**Future decomposition needed**: Ring buffer and compression pipeline exceed 400-line guideline.

### 🌐 **Server Module** (`server/`)

**Purpose**: MCP server implementations and communication protocols

- `core.rs` - Central VectorMolecularSystem orchestration
- `tcp.rs` / `stdio.rs` - Transport-specific implementations
- `tools.rs` - MCP tool method implementations
- `fifo_consumers.rs` - Background FIFO processing
- `json_rpc.rs` - JSON-RPC protocol handling 🔄 *[NEEDS SPLIT]*

### 💾 **Storage Module** (`storage/`)

**Purpose**: Data persistence and retrieval abstractions

- `cache.rs` - In-memory event caching with LRU eviction
- `lancedb/` - **Successfully decomposed** vector database integration:

**LanceDB Decomposition Success**:
- **`connection.rs`** (95 lines) - Database lifecycle management
- **`operations.rs`** (200 lines) - CRUD operations and caching
- **`search.rs`** (195 lines) - Vector similarity search
- **`statistics.rs`** (150 lines) - Analytics and monitoring

## 📊 LLM-Friendliness Status

### ✅ **Compliant Files** (Under 400 lines)
- All files in `storage/lancedb/` - Successfully decomposed
- All files in `models/schema/` - Successfully decomposed
- Most server, ingestion, and utility files

### 🔄 **Still Need Decomposition**
1. **`processing/ring_buffer.rs`** (592 lines) - Multi-tier buffering logic
2. **`processing/compression_pipeline.rs`** (402 lines) - Event compression
3. **`server/json_rpc.rs`** (420 lines) - JSON-RPC protocol handling
4. **`models/config.rs`** (421 lines) - Configuration structures

### 🎯 **Target Architecture Benefits**

**Maintainability**: Each file has a single, focused responsibility
**Testability**: Components can be unit tested in isolation
**Extensibility**: Easy to add new storage backends, server types, etc.
**LLM-Friendly**: Perfect file sizes for AI-assisted development
**Type Safety**: Clear module boundaries with well-defined interfaces

## 🚀 **Next Steps**

1. **Complete remaining decompositions** for files exceeding 400 lines
2. **Add trait abstractions** for Storage and Processing layers
3. **Implement domain-specific error types** (replace anyhow)
4. **Add comprehensive integration tests** for decomposed modules
5. **Performance benchmarking** to ensure decomposition doesn't impact speed

## 🔧 **Development Workflow**

**For AI Assistants**: Each module is now appropriately sized for focused development
**For Humans**: Clear separation of concerns makes debugging and feature development easier
**For Testing**: Isolated modules enable comprehensive unit testing strategies

---

*This architecture achieves the goal of LLM-friendliness while maintaining clean separation of concerns and extensibility for future molecular intelligence features.*
2 changes: 1 addition & 1 deletion src/bin/test_embeddings.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use molecular::embeddings::{MolecularEmbeddings, EmbeddingConfig, EmbeddingInput};
use molecular::{MolecularEmbeddings, EmbeddingConfig, EmbeddingInput};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
Expand Down
12 changes: 5 additions & 7 deletions src/fifo_ingestion.rs → src/ingestion/fifo_ingestion.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,7 @@
* This is the core missing piece that enables true real-time molecular capture.
*/

use crate::vector_database::MolecularVectorDB;
use crate::vector_schema::{MolecularEvent, EventType, EventImportance, EventContent};
use crate::config::MolecularConfig;
use crate::{*, EventSource, ResolutionStatus, FileOperation};
use anyhow::{Result, anyhow};
use serde_json::Value;
use std::path::Path;
Expand Down Expand Up @@ -130,11 +128,11 @@ impl FifoIngestionSystem {
"error" => EventType::ErrorInvestigation {
error_type: json.get("error_type").and_then(|e| e.as_str()).unwrap_or("unknown").to_string(),
error_code: json.get("error_code").and_then(|e| e.as_str()).map(|s| s.to_string()),
resolution_status: crate::vector_schema::ResolutionStatus::Investigating,
resolution_status: ResolutionStatus::Investigating,
},
"file_edit" => EventType::FileEdit {
file_path: json.get("file_path").and_then(|f| f.as_str()).unwrap_or("unknown").to_string(),
operation: crate::vector_schema::FileOperation::Modify { old_size: 0, new_size: 0 },
operation: FileOperation::Modify { old_size: 0, new_size: 0 },
lines_changed: json.get("lines_changed").and_then(|l| l.as_u64()).unwrap_or(0) as u32,
},
"learning" => EventType::Custom {
Expand Down Expand Up @@ -168,7 +166,7 @@ impl FifoIngestionSystem {
project: self.project.clone(),
event_sequence: 0, // Will be set by storage layer
event_type,
source: crate::vector_schema::EventSource::Terminal,
source: EventSource::Terminal,
importance: self.classify_importance(&content.primary_text),
content,
context,
Expand Down Expand Up @@ -213,7 +211,7 @@ impl FifoIngestionSystem {
event_name: "text_event".to_string(),
data: serde_json::json!({"content": line}),
}, // Default assumption
source: crate::vector_schema::EventSource::Terminal,
source: EventSource::Terminal,
importance: self.classify_importance(line),
content,
context,
Expand Down
12 changes: 12 additions & 0 deletions src/ingestion/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
/*
* INGESTION MODULE - Input systems and data ingestion
*
* This module provides various input mechanisms:
* - FIFO-based event ingestion
* - Future: HTTP ingestion, file watchers, etc.
*/

pub mod fifo_ingestion;

// Re-export key types
pub use fifo_ingestion::*;
67 changes: 22 additions & 45 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,59 +3,36 @@
*
* This library provides the core molecular logging and vector search capabilities
* for development intelligence across all projects and sessions.
*
* REFACTORED ARCHITECTURE:
* - models/: Core data structures and schemas
* - server/: MCP server implementations (TCP, stdio, tools)
* - storage/: Data persistence (LanceDB, caching)
* - processing/: Event processing (ring buffer, classification, compression)
* - ingestion/: Input systems (FIFO, future HTTP/websockets)
*/

pub mod vector_schema;
pub mod embeddings;
pub mod vector_database; // FULL MOLECULAR INTELLIGENCE RESTORED!
// pub mod vec_db; // Removed - conflicts with heavyweight MolecularVectorDB

// RESTORED MODULES (Archaeological Expedition 2025-08-15 by VectorSonny):
// These production-ready modules were archived during LanceDB troubleshooting but are now restored
pub mod semantic_classifier; // 5-tier semantic classification: Crit→Learn→Ref→Ctx→Noise + intelligent dedup
pub mod compression_pipeline; // 90%+ intelligent compression with 3-tier storage: Full→Summary→Stats

// FIFO REAL-TIME INGESTION (PurgeMaster 2025-08-16):
// Real-time event streaming to replace batch file processing
pub mod fifo_ingestion; // FIFO pipe consumption for real-time molecular capture

// RING BUFFER SYSTEM (VectorSonny 2025-08-16):
// Multi-tier buffering to handle event floods from intensive development sessions
pub mod ring_buffer; // Ring buffer with semantic prioritization and compression
// Core module structure
pub mod models;
pub mod server;
pub mod storage;
pub mod processing;
pub mod ingestion;

// CONFIGURATION MANAGEMENT (QuantumFixer 2025-08-16):
// Centralized configuration system with environment variable support
pub mod config; // Master configuration for all molecular components
// Utility modules
pub mod check_versions;


// Re-export key types for convenience
pub use vector_schema::{
// Re-export key types for convenience (avoiding conflicts)
pub use models::{
MolecularEvent, EventType, EventSource, EventImportance, EventContent, EventContext,
SemanticQuery, SemanticSearchResult, CodeSnippet, FileOperation, ResolutionStatus, FeedbackType,
};

pub use embeddings::{
MolecularEmbeddings, EmbeddingConfig, EmbeddingDevice, EmbeddingInput, EmbeddingResult, EmbeddingUtils,
MolecularConfig, ServerConfig, FifoConfig, VectorDbConfig, EmbeddingsConfig, SystemConfig,
};

pub use vector_database::{
MolecularVectorDB, VectorDBConfig, VectorDBStatistics,
};

// RESTORED MODULE EXPORTS (VectorSonny Archaeological Restoration 2025-08-15):
pub use semantic_classifier::{
pub use storage::{MolecularVectorDB, VectorDBConfig, VectorDBStatistics, EventCache};
pub use processing::{
MolecularRingBuffer, BufferStats, BufferHealth,
EventClassifier, EventCategory, CompressionLevel, SemanticEvent, DedupEntry, ClassifierStats,
};

pub use compression_pipeline::{
CompressionPipeline, CompressedSession, CompressedEvent, EventData, SessionStats,
};

pub use ring_buffer::{
MolecularRingBuffer, RingBufferConfig, BufferStats, BufferHealth, CompressedEvent as RingCompressedEvent,
};

pub use config::{
MolecularConfig, ServerConfig, FifoConfig, RingBufferConfig as ConfigRingBufferConfig,
VectorDbConfig, EmbeddingsConfig, SystemConfig, CompressionLevel as ConfigCompressionLevel,
};
pub use ingestion::*;
Loading