Enhancement: Memory Observability and Debugging Tools

## Overview
Develop comprehensive observability tools for understanding memory system behavior, including visualization dashboards, debugging utilities, and analysis tools to support research, development, and educational use.

## Research Motivation

The current memory system is a "black box" with limited visibility into:
- **Memory Formation**: Why certain interactions become anchors
- **Decay Dynamics**: How memories fade over time
- **Retrieval Patterns**: Which memories are recalled and why
- **Interference Effects**: How memories interact and conflict
- **System Performance**: Bottlenecks, errors, and resource usage

Without observability:
- **Research Validation**: Hard to understand what the system actually does
- **Development**: Debugging is difficult and time-consuming
- **Education**: Students can't see how memory mechanisms work
- **Iteration**: Can't measure impact of changes systematically

## Proposed Implementation

### Part 1: Memory Visualization Dashboard

#### Web-Based Dashboard
```python
# FastAPI/Streamlit dashboard
class MemoryDashboard:
    def render_memory_timeline(session_id: str):
        """Interactive timeline of memory formation and decay"""
        
    def render_retrieval_heatmap(session_id: str):
        """Which memories are recalled together"""
        
    def render_decay_curves(session_id: str):
        """Actual vs theoretical forgetting curves"""
        
    def render_interference_graph(session_id: str):
        """Network of memory conflicts and relationships"""
```

#### Key Visualizations
1. **Memory Timeline**
   - Horizontal timeline with anchors as bubbles
   - Bubble size = initial salience
   - Color intensity = current strength
   - Hover shows full anchor text and metadata

2. **Retrieval Heatmap** 
   - Matrix showing which memories are co-recalled
   - Identifies memory clusters and associations
   - Time-filtered views (last hour, day, week)

3. **Decay Visualization**
   - Multiple decay curves overlaid
   - Compare different memory types (if Issue #29 implemented)
   - Show reactivation events as spikes (if Issue #28 implemented)

4. **Semantic Space Projection**
   - 2D/3D projection of memory embeddings (t-SNE/UMAP)
   - Color by time, emotion, or memory type
   - Interactive exploration of semantic neighborhoods

5. **System Performance**
   - Real-time metrics: latency, throughput, error rates
   - Resource usage: Kafka lag, Qdrant memory, CPU/GPU
   - Worker health and processing queues

### Part 2: Debugging and Analysis Tools

#### Memory Inspector CLI
```bash
# Command-line tools for developers
./memory_inspector.py --session SESSION_ID
  list                    # List all anchors
  show ANCHOR_ID          # Show full anchor details
  search "query"          # Search and show similarity scores
  decay-simulate +30d     # Show predicted decay state
  export --format json    # Export for external analysis
  validate                # Check for inconsistencies
```

#### Debugging Queries
```python
class MemoryAnalyzer:
    def find_orphaned_anchors(session_id: str) -> List[Anchor]:
        """Anchors never recalled after creation"""
        
    def detect_retrieval_anomalies(session_id: str) -> List[Dict]:
        """Unexpectedly high/low similarity scores"""
        
    def analyze_decay_deviations(session_id: str) -> Dict:
        """Where actual decay differs from theoretical"""
        
    def find_memory_conflicts(session_id: str) -> List[Conflict]:
        """Contradictory or interfering memories"""
        
    def compute_coherence_metrics(session_id: str) -> Dict:
        """Various measures of memory system health"""
```

#### A/B Testing Framework
```python
class ExperimentFramework:
    def create_experiment(name: str, variants: List[Config]) -> Experiment:
        """Set up controlled experiment with different configurations"""
        
    def assign_user_to_variant(session_id: str, experiment: str) -> str:
        """Random assignment with session tracking"""
        
    def collect_metrics(experiment: str) -> ExperimentResults:
        """Gather performance and quality metrics by variant"""
        
    def statistical_analysis(results: ExperimentResults) -> Report:
        """Significance testing, confidence intervals"""
```

### Part 3: Educational and Research Tools

#### Interactive Memory Simulator
```python
class MemorySimulator:
    """Standalone tool for educational demos"""
    
    def load_scenario(scenario_file: str):
        """Pre-defined conversation scenarios"""
        
    def step_through_time(days: int):
        """Show memory state at different time points"""
        
    def compare_configurations(configs: List[Config]):
        """Side-by-side comparison of different settings"""
        
    def generate_report() -> EducationalReport:
        """Summarize key learning points"""
```

#### Research Analysis Notebooks
```python
# Jupyter notebooks for common analyses
notebooks/
  memory_system_analysis.ipynb       # Basic system behavior
  decay_curve_fitting.ipynb          # Statistical model validation  
  retrieval_pattern_analysis.ipynb   # User behavior patterns
  comparative_evaluation.ipynb       # A/B test analysis
  longitudinal_study.ipynb           # Long-term behavior tracking
```

## Implementation Plan

### Phase 1: Basic Observability (3 weeks)
1. **Metrics Collection**: Add structured logging to all workers
   ```python
   # Example metrics
   logger.info("anchor_created", {
       "anchor_id": anchor_id,
       "session_id": session_id,
       "text_length": len(text),
       "embedding_model": model,
       "timestamp": time.time()
   })
   ```

2. **Data Pipeline**: Stream metrics to ClickHouse/PostgreSQL for analysis

3. **Basic Dashboard**: Simple Streamlit app with timeline and list views

### Phase 2: Advanced Visualization (3 weeks)
1. **Interactive Dashboard**: React/Vue.js frontend with D3.js visualizations
2. **Real-time Updates**: WebSocket connection for live monitoring
3. **Semantic Space Visualization**: Integration with dimensionality reduction

### Phase 3: Analysis Tools (2 weeks)
1. **CLI Inspector**: Command-line interface for developers
2. **Debugging Queries**: Automated anomaly detection
3. **Export/Import**: Data portability for external analysis

### Phase 4: Research Framework (3 weeks)
1. **A/B Testing**: Experiment management and statistical analysis
2. **Educational Tools**: Simplified interface for teaching
3. **Analysis Notebooks**: Pre-built research templates

## Technical Architecture

### Data Collection
```yaml
logging_config:
  structured_logging: true
  outputs:
    - console  # Development
    - file     # Production logs
    - database # Analytics database
  
metrics:
  collection_interval: 1s
  retention_period: 90d
  
tracing:
  enable_distributed_tracing: true
  sample_rate: 0.1  # 10% of requests
```

### Storage
```python
# Time-series database for metrics
class MetricsStore:
    # ClickHouse or InfluxDB for performance data
    # PostgreSQL for relational queries
    # Redis for real-time caching
```

### Dashboard Stack
```
Frontend: React + D3.js + WebSockets
Backend: FastAPI + Pydantic
Database: ClickHouse (metrics) + PostgreSQL (metadata)
Caching: Redis
Deployment: Docker Compose (development) + Kubernetes (production)
```

## Research Value

- **Transparency**: Makes memory system behavior visible and interpretable
- **Validation**: Enables rigorous experimental validation
- **Education**: Students can see psychological memory principles in action
- **Development**: Faster iteration through better debugging tools
- **Reproducibility**: Detailed logging enables replication of experiments

## Integration Points

### Existing System
- **All Workers**: Add metrics collection without changing core logic
- **Qdrant**: Query for anchor metadata and similarity analysis
- **Kafka**: Monitor message flow and processing latencies

### Future Enhancements
- **Issue #28 (Reactivation)**: Visualize strengthening events
- **Issue #29 (Adaptive Decay)**: Compare decay rates by memory type
- **Issue #31 (Multi-modal)**: Show image/audio memories in timeline
- **Issue #32 (Interference)**: Network visualization of memory conflicts

## Complexity Estimate
**Medium-High** - Requires full-stack development, data engineering, and statistical analysis tools.

## Success Metrics

- **Developer Productivity**: Time to debug issues (target: 50% reduction)
- **Research Quality**: Number of insights gained from visualization
- **Educational Impact**: Student understanding scores in memory system courses
- **System Reliability**: Early detection of performance issues

## Related Tools

- **Similar Systems**: MLflow, Weights & Biases (ML experiment tracking)
- **Visualization**: Observable, Plotly Dash, Grafana
- **Memory Research**: Tools from cognitive psychology labs

## References
- `research/experimental_methodology.md` - Evaluation framework context
- `research/threats_to_validity.md` - Observability needs for validation
- `convai_narrative_memory_poc/workers/` - Integration points for metrics
- All other issues - Enhanced visibility into proposed features

Enhancement: Memory Observability and Debugging Tools #33

Description

Overview

Research Motivation

Proposed Implementation

Part 1: Memory Visualization Dashboard

Web-Based Dashboard

Key Visualizations

Part 2: Debugging and Analysis Tools

Memory Inspector CLI

Debugging Queries

A/B Testing Framework

Part 3: Educational and Research Tools

Interactive Memory Simulator

Research Analysis Notebooks

Implementation Plan

Phase 1: Basic Observability (3 weeks)

Phase 2: Advanced Visualization (3 weeks)

Phase 3: Analysis Tools (2 weeks)

Phase 4: Research Framework (3 weeks)

Technical Architecture

Data Collection

Storage

Dashboard Stack

Research Value

Integration Points

Existing System

Future Enhancements

Complexity Estimate

Success Metrics

Related Tools

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions