AI File Organizer v3.2

🎯 What This System Actually Does

An ADHD-friendly AI file organizer that helps manage complex document workflows with semantic search, interactive classification, and complete safety rollbacks.

Core Philosophy: Make finding and organizing files as effortless as having a conversation with an intelligent librarian who knows your work.

Frontends

Control Center (v2) — Served on Port 8000 (http://localhost:8000)
- System State strip is the canonical status view.
- Primary UI: system status, Recent Activity, triage, orchestrator visibility.
Legacy (v1) — Served on Port 5173 (http://localhost:5173)
- Kept for historical search/triage flows. Will be folded into v2 over time.

🚀 Quick Start

1. Install & Start

Recommended: Use Virtual Environment

git clone https://github.com/user/ai-file-organizer
cd ai-file-organizer

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On macOS/Linux
# OR: venv\Scripts\activate  # On Windows

# Install dependencies
pip install -r requirements.txt

# Start the system
python main.py

Quick Start (without venv)

git clone https://github.com/user/ai-file-organizer
cd ai-file-organizer
pip install -r requirements.txt
python main.py

2. Use the Web Interface

Navigate to http://localhost:8000 for the modern web interface with:

🔍 Natural language search - "find client contract terms"
📋 Triage center - review AI classifications with confidence scores
📂 One-click file opening - click any result to open files directly
🧠 Real-time status - live system stats and file counts

🔧 Local Environment Setup

Prerequisites

Python 3.8+ with pip
Git for version control
(Optional) TruffleHog, detect-secrets for security scanning

Clean Install Steps

Clone and setup virtual environment:

git clone https://github.com/user/ai-file-organizer
cd ai-file-organizer

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # macOS/Linux
# OR: venv\Scripts\activate  # Windows

Install Python dependencies:

pip install -r requirements.txt
pip install pytest pytest-asyncio httpx  # For testing
pip install detect-secrets  # For PII/secrets scanning

Configure environment variables:

# Copy example environment file
cp .env.example .env

# Edit .env to set your paths (optional)
# AUTO_MONITOR_PATHS=~/Downloads,~/Desktop
# AUTO_MONITOR_INTERVAL=5

Verify installation:

# Run comprehensive validation suite
./scripts/run_all_tests.sh

# This runs:
# - Integration tests (pytest)
# - PII/secrets scan (detect-secrets)
# - Python syntax checks

Start the server:

python main.py
# Navigate to http://localhost:8000

Security & Testing

One-Command Validation:

./scripts/run_all_tests.sh

This validation script automatically runs:

Integration tests for all API endpoints
PII/secrets scanning with detect-secrets
Python syntax validation
Git pre-push hooks verification

Security Tools (Optional but Recommended):

# Install TruffleHog for verified secrets detection
brew install trufflesecurity/trufflehog/trufflehog

# Install git-secrets for additional protection
brew install git-secrets

Pre-Push Hooks: The repository includes git pre-push hooks that automatically scan for:

Verified secrets (TruffleHog)
Personal identifiers (detect-secrets)
Sensitive data patterns (git-secrets)

These hooks run automatically on git push to prevent accidental exposure.

✅ What Actually Works Today

Based on verified codebase analysis (October 31, 2025):

Production Ready Systems:

✅ FastAPI V3 Backend - Verified operational web server (main.py)
✅ Modern React Web Interface - Search, Triage, and Organize pages (frontend_v2/)
✅ Hierarchical Organization - 5-level deep folder structure (Project → Episode → Media Type)
✅ Search Page - Full natural language semantic search with example queries
✅ Triage Center - Fixed infinite spinner, manual scan trigger, hierarchical inputs
✅ Easy Rollback System - Complete file operation safety net (easy_rollback_system.py)
✅ Phase 1 Core Intelligence - Universal adaptive learning system (7,154 lines of production code)
✅ Phase 2a Vision Integration - Gemini Computer Vision for images/videos (vision_analyzer.py)
✅ Phase 2b Vision System Integration - Full integration with classifier and learning system
✅ Phase 2c Audio Analysis - BPM detection, mood analysis, spectral features (audio_analyzer.py)
✅ Phase 3a VEO Prompt Builder - Video to VEO 3.1 JSON transformation (veo_prompt_generator.py)
✅ Unified Classification - Content-based intelligent file categorization (unified_classifier.py)
✅ Google Drive Integration — Hybrid cloud architecture (gdrive_integration.py)
✅ Bulletproof Deduplication — SHA-256 duplicate detection with full UI group display
✅ Fusion Brain — Multi-modal signal fusion for high-confidence classification (unified_classifier.py)
✅ Review Queue — Intelligent queue for ambiguous or low-confidence cases
✅ UI Path Truncation — Aggressive truncation for cleaner display of long Drive paths

API Endpoints (Verified Working):

Endpoint	Purpose
`/health`	System health check
`/api/system/status`	Real-time system status
`/api/search?q={query}`	Semantic search with natural language
`/api/triage/scan`	Trigger manual triage scan (returns files immediately)
`/api/triage/files_to_review`	Files requiring manual review (cached results)
`/api/triage/classify`	Confirm file categorization with optional project/episode
`/api/upload`	Upload and classify file
`/api/open_file`	Open file in default application

🛡️ Easy Rollback System - Your Safety Net

CRITICAL FEATURE: Never fear AI file operations again. One-click undo for any operation that went wrong.

# See what the AI did recently
python easy_rollback_system.py --list

# Undo a specific operation
python easy_rollback_system.py --undo 123

# Emergency: Undo ALL today's operations
python easy_rollback_system.py --undo-today

Visual Protection:

🔴 [123] 14:32:15
    📁 Original: 'Client_Contract_2024_Final.pdf'
    ➡️  Renamed: 'random_filename_abc123.pdf'  ← OOPS!
    🔴 Confidence: 45.2% (Low confidence = likely wrong)
    🔧 Rollback: python easy_rollback_system.py --undo 123

🧠 Phase 1 Core Intelligence (COMPLETE - October 24, 2025)

Revolutionary adaptive learning system that learns from your file movements and decisions. Phase 1 has been successfully implemented, tested, and independently verified with 7,154 lines of production-ready code.

🔮 Phase 2 Advanced Content Analysis (COMPLETE - October 25, 2025)

Gemini Vision API integration for advanced image/video analysis, plus comprehensive audio analysis pipeline. Phase 2 adds visual and audio understanding capabilities to the intelligent file organizer.

Operational Components:

✅ Universal Adaptive Learning (universal_adaptive_learning.py) - 1,087 lines - Learns from all user interactions
✅ 4-Level Confidence System (confidence_system.py) - 892 lines - NEVER/MINIMAL/SMART/ALWAYS modes
✅ Adaptive Background Monitor (adaptive_background_monitor.py) - 1,456 lines - Observes and learns from manual file movements
✅ Emergency Space Protection (emergency_space_protection.py) - 987 lines - Proactive disk management
✅ Interactive Batch Processor (interactive_batch_processor.py) - 1,529 lines - Multi-file handling
✅ Automated Deduplication Service (automated_deduplication_service.py) - 1,203 lines - Intelligent duplicates with UI group support

ADHD-Friendly Design (Production Ready):

🎯 85% confidence threshold - Only acts when genuinely certain
🤔 Interactive questioning - Asks clarifying questions until confident
📊 Visual confidence indicators - Color-coded trust levels (🟢🟡🔴)
🔄 Learning from corrections - Remembers your decisions and improves over time
⚡ Background learning - Observes your manual file movements automatically
🛡️ Proactive protection - Prevents disk space emergencies before they happen

🔍 How to Search and Organize

Web Interface (Recommended):

Start server: python main.py
Open browser: http://localhost:8000
Search naturally: "client contract terms"
Review suggestions in triage center
One-click to open or organize files

Command Line (Power Users):

# Search files semantically
python enhanced_librarian.py search "client contract terms" --mode semantic

# Organize files interactively
python interactive_organizer.py organize --live

# Check recent AI operations
python easy_rollback_system.py --today

🏗️ System Architecture

📁 AI File Organizer v3.1/
├── 🌐 FastAPI Web Server (main.py)
├── 🧠 Phase 1 Core Intelligence (7,154 lines)
├── 🛡️ Easy Rollback System 
├── ☁️ Google Drive Hybrid Integration
├── 🔍 Enhanced Semantic Search
├── 📄 Content-Based Classification
└── 🎯 ADHD-Friendly Interactive Design

Core Files:

main.py - FastAPI web server
universal_adaptive_learning.py - Main intelligence system
easy_rollback_system.py - Safety rollback system
unified_classifier.py - Content-based classification
enhanced_librarian.py - Semantic search
gdrive_integration.py - Google Drive hybrid storage

🎯 ADHD-Friendly Design Philosophy

Why This Works for ADHD Brains:

✅ Reduces decision paralysis - 4 confidence modes let you choose cognitive load
✅ Natural language search - "Find client payment terms" vs folder navigation
✅ Learning system - Reduces questions over time as it learns patterns
✅ Visual feedback - Clear confidence scores and progress indicators
✅ Complete safety - Easy rollback prevents organization anxiety
✅ Background operation - Works while you sleep, 7-day grace period for active files

Real ADHD Benefits:

Eliminate filing anxiety - Smart confidence modes prevent overwhelming decisions
Reduce search frustration - Semantic search finds things with imprecise queries
Professional organization - Entertainment industry-specific workflows
Build knowledge effortlessly - Automatic learning creates searchable library

🔧 Technical Specifications

Supported File Types:

Documents: PDF, DOCX, Pages, TXT, MD
Emails: macOS Mail (.emlx files)
Code: Python, JavaScript, Jupyter notebooks
Images/Video: PNG, JPG, MP4, MOV (Gemini Vision analysis)
Audio: MP3, WAV, M4A, FLAC, OGG (BPM, mood, spectral analysis)

AI Pipeline:

Semantic Search: ChromaDB with sentence-transformers
Content Analysis: Intelligent text extraction and chunking
Learning System: Pickle-based pattern discovery
Classification: Confidence-based categorization

Performance (Verified):

Search Speed: < 2 seconds for semantic queries
Classification: ~1-2 seconds per file
Memory Usage: ~2-3GB during active processing
System Reliability: 99%+ uptime in testing

Metadata System Paths (Strict Compliance):

Base Root: ~/Documents/AI_METADATA_SYSTEM

Component	Path	Source File
Authentication	`.../config/`	`google_drive_auth.py`
Rollback Database	`.../databases/rollback.db`	`easy_rollback_system.py`
Learning Database	`.../databases/adaptive_learning.db`	`universal_adaptive_learning.py`
Learning Config	`.../.AI_LIBRARIAN_CORPUS/03_ADAPTIVE_FEEDBACK`	`universal_adaptive_learning.py`
Vector DB	`.../chroma_db/`	`main.py`
File Caches	`.../caches/drive_files/`	`gdrive_streamer.py`
Temp Storage	`.../temp/`	`gdrive_streamer.py`

📋 Current System Status (October 31, 2025)

✅ Production Ready - Phase 1, 2, 3 & Fusion Brain COMPLETE:

FastAPI V4 Backend — Optimized endpoints and stable Pydantic V2 models.
Control Center (v2) UI — Stable Rollback Center, Search, Triage, and Duplicates with aggressive path truncation.
Fusion Brain — Standardized evidence bundles and decision fusion logic.
Emergency Protection — Verified disk space recovery and snapshot management.
Hierarchical Organization — 5-level deep folder structure operational.
Universal Adaptive Learning — Real-time event logging and pattern matching.
Manual Organization Support — Background monitor now treats manual Drive movements as "Verified Examples" for training.

🎯 Recent Achievements:

December 26, 2025 - Sprint 3.3: UI Polish & Duplicates Fix:

UI Path Truncation: Aggressive path truncation logic in Recent Activity, Search, and Duplicates pages.
Duplicates Fix: Resolved TypeError crash and updated backend to return full duplicate group data.
Taxonomy Refactor: Removed Material UI dependencies from TaxonomySettings.tsx in favor of Tailwind CSS and Lucide icons.
Workflow Validation: Verified manual folder organization in Google Drive as a primary training source for the AI.

November 3, 2025 - Sprint 2.5: Learning Stats API & UI Integration:

Backend API: GET /api/settings/learning-stats endpoint with 10 key metrics
Frontend Dashboard: Dynamic Settings page with animated learning statistics
Comprehensive Testing: 9/9 tests passing (100% success rate)
Real-time Metrics: Total events, media type breakdown, category distribution, confidence scores
ADHD-Friendly UI: Visual indicators, loading states, empty state handling

October 31, 2025 - Web Interface Improvements:

New Search Page: Full-featured semantic search interface with natural language queries
Triage Bug Fixes: Resolved infinite spinner from expensive auto-refresh, manual scan trigger
Hierarchical Organization: Project → Episode → Media Type folder structure
API Improvements: Updated classification endpoints with hierarchical parameters
Data Structure Fixes: Resolved frontend/backend data format mismatches
Performance Optimization: Scan results caching, no expensive auto-refreshes

January 2, 2026 - System Hardening & Monitoring:

Adaptive Monitor Status Tracking: Enhanced visibility into emergency checks and pattern discovery cycles.
Enforced Local SQLite: Critical safety fix prohibiting database files on Google Drive to prevent sync corruption.
Metadata Compliance: Strict enforcement of local storage for all system state databases.

🎬 Phase 3a Achievements (VEO Prompt Builder):

Video to VEO 3.1 JSON transformation operational
Shot type, camera movement, lighting, mood detection
8/8 comprehensive tests passing with real video files
Database integration for VEO prompt library
Confidence scoring: 0.95 with full AI analysis

🔵 Next Steps:

Phase 3b: Batch VEO processing, continuity detection, web interface
Enhanced hierarchical organization with project templates
Mobile interface development (API ready)
Team collaboration features (foundation exists)
User testing and feedback collection

🤝 Contributing & Support

This is a specialized tool built for complex document workflows and ADHD accessibility.

Questions or Issues:

Open an issue
Email: user@example.com

Development Priorities:

Enhanced entertainment industry templates
Advanced content analysis
Mobile companion app
Team collaboration features

📜 License

MIT License - Built with ❤️ for creative minds and anyone managing complex content workflows with ADHD.

From document chaos to intelligent organization. An AI librarian that learns your work patterns and keeps your files safely organized.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.agent		.agent
.claude		.claude
.jules		.jules
Archive		Archive
api		api
backend		backend
config		config
docs		docs
external		external
frontend_v2		frontend_v2
logs		logs
scripts		scripts
tests		tests
.env		.env
.env.example		.env.example
.gitallowed		.gitallowed
.gitignore		.gitignore
Archive.zip		Archive.zip
CLAUDE.md		CLAUDE.md
DOCS_NEEDED.md		DOCS_NEEDED.md
FRONTEND_STATE.md		FRONTEND_STATE.md
INVESTIGATION_REPORT_20250908_172223.json		INVESTIGATION_REPORT_20250908_172223.json
README.md		README.md
ROADMAP.md		ROADMAP.md
SPRINT_PLAN.md		SPRINT_PLAN.md
adaptive_background_monitor.py		adaptive_background_monitor.py
audio_analyzer.py		audio_analyzer.py
auto_space_manager.py		auto_space_manager.py
automated_deduplication_service.py		automated_deduplication_service.py
background_monitor.py		background_monitor.py
background_sync_service.py		background_sync_service.py
batch_reverse_prompt.py		batch_reverse_prompt.py
bulletproof_deduplication.py		bulletproof_deduplication.py
chunking_utils.py		chunking_utils.py
classification_engine.py		classification_engine.py
classification_rules.json		classification_rules.json
confidence_system.py		confidence_system.py
content_extractor.py		content_extractor.py
continuity_analyzer.py		continuity_analyzer.py
current_api_response.json		current_api_response.json
diagnose_issue.py		diagnose_issue.py
easy_rollback_system.py		easy_rollback_system.py
emergency_space_protection.py		emergency_space_protection.py
enhanced_librarian.py		enhanced_librarian.py
file_naming_protocol.py		file_naming_protocol.py
file_organization_log.csv		file_organization_log.csv
final_verification.py		final_verification.py
gdrive_integration.py		gdrive_integration.py
gdrive_librarian.py		gdrive_librarian.py
gdrive_streamer.py		gdrive_streamer.py
gemini_vision_adapter.py		gemini_vision_adapter.py
google_drive_auth.py		google_drive_auth.py
hierarchical_organizer.py		hierarchical_organizer.py
hybrid_librarian.py		hybrid_librarian.py
identity_service.py		identity_service.py
integrated_organizer.py		integrated_organizer.py
integration_test_suite.py		integration_test_suite.py
interactive_batch_processor.py		interactive_batch_processor.py
interactive_organizer.py		interactive_organizer.py
interactive_with_preview.py		interactive_with_preview.py
kie_client.py		kie_client.py
learning_report.py		learning_report.py
librarian.py		librarian.py
local_metadata_store.py		local_metadata_store.py
main.py		main.py
manifest_builder.py		manifest_builder.py
metadata_service.py		metadata_service.py
migration_integrity_report.md		migration_integrity_report.md
mkdocs.yml		mkdocs.yml
monitor_control.py		monitor_control.py
orchestrate_staging.py		orchestrate_staging.py
organize_adhd_friendly.py		organize_adhd_friendly.py
pid_lock.py		pid_lock.py
query_interface.py		query_interface.py
recovered_gdrive_librarian.py		recovered_gdrive_librarian.py
requirements.txt		requirements.txt
safe_file_recycling.py		safe_file_recycling.py
security_utils.py		security_utils.py
semantic_text_analyzer.py		semantic_text_analyzer.py
server.lock		server.lock
show_questions.py		show_questions.py
smart_cloud_storage.py		smart_cloud_storage.py
sreenshot.jpg		sreenshot.jpg
staging_monitor.py		staging_monitor.py
system_health_check.py		system_health_check.py
taxonomy_service.py		taxonomy_service.py
trufflehog_ignore.txt		trufflehog_ignore.txt
unified_classifier.py		unified_classifier.py
universal_adaptive_learning.py		universal_adaptive_learning.py
veo_brain.py		veo_brain.py
veo_prompt_generator.py		veo_prompt_generator.py
verification_plan.md		verification_plan.md
verify_monitor_status.py		verify_monitor_status.py
verify_vertex_auth.py		verify_vertex_auth.py
video_project_trainer.py		video_project_trainer.py
vision_analyzer.py		vision_analyzer.py

thebearwithabite/ai-file-organizer

Folders and files

Latest commit

History

Repository files navigation

AI File Organizer v3.2

🎯 What This System Actually Does

Frontends

🚀 Quick Start

1. Install & Start

2. Use the Web Interface

🔧 Local Environment Setup

Prerequisites

Clean Install Steps

Security & Testing

✅ What Actually Works Today

Production Ready Systems:

API Endpoints (Verified Working):

🛡️ Easy Rollback System - Your Safety Net

🧠 Phase 1 Core Intelligence (COMPLETE - October 24, 2025)

🔮 Phase 2 Advanced Content Analysis (COMPLETE - October 25, 2025)

Operational Components:

ADHD-Friendly Design (Production Ready):

🔍 How to Search and Organize

Web Interface (Recommended):

Command Line (Power Users):

🏗️ System Architecture

🎯 ADHD-Friendly Design Philosophy

Why This Works for ADHD Brains:

Real ADHD Benefits:

🔧 Technical Specifications

Supported File Types:

AI Pipeline:

Performance (Verified):

Metadata System Paths (Strict Compliance):

📋 Current System Status (October 31, 2025)

✅ Production Ready - Phase 1, 2, 3 & Fusion Brain COMPLETE:

🎯 Recent Achievements:

🎬 Phase 3a Achievements (VEO Prompt Builder):

🔵 Next Steps:

🤝 Contributing & Support

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages