A Llama-powered personal memory system that processes Meta Ray-Ban glasses footage to help you recall conversations, insights, and daily moments through intelligent video analysis.
Built for the Llama Hackathon - Your AI companion for augmented memory
- ✅ Complete video processing pipeline for Meta glasses footage
- ✅ Advanced transcription with OpenAI Whisper (5 models)
- ✅ Smart frame extraction optimized for conversations
- ✅ Llama 4 integration with multimodal understanding
- ✅ Natural language querying via API
- ✅ Batch processing for multiple videos
- ✅ WebAI Navigator elements
Due to the 24-hour hackathon constraint, we have a working MVP with manual video transfer:
Current Workflow:
- 📹 Capture: Videos recorded with Meta Ray-Ban glasses
- 📱 Transfer: Manual upload to processing system (Google Drive)
- ⚡ Process: Automatic transcription + frame extraction
- 🧠 Analyze: Llama processes and stores all context
- 💬 Query: Ask questions about your conversations
- 📲 Direct API integration with Meta glasses companion app
- 🔄 Real-time processing as videos are captured
- ☁️ Seamless sync without manual file transfer
- 📱 Mobile app for instant querying on-the-go
Meta Glasses Recording → Manual Upload → Our Pipeline → Intelligent Memory
↓ ↓ ↓ ↓
"Conversation" → video.mp4 → AI Analysis → "What did we discuss?"
Built as a complete WebAI Navigator ecosystem showcasing private AI capabilities:
- ✅ 4 Production-Ready Elements: Complete pipeline implemented as WebAI elements
- ✅ Private Processing: All video analysis happens locally - no data leaves your infrastructure
- ✅ Enterprise-Ready: Built with WebAI Element SDK for production deployment
- ✅ Modular Architecture: Each component can be used independently or together
Element ID: custom-video-processor
- Advanced WebAI Element SDK implementation
- Configurable UI settings for batch processing
- Structured Frame output for WebAI ecosystem
- Stay-alive mode for continuous processing
- Full setup.py and requirements for production deploymentElement ID: media_loader (UUID: 1916c9ba-fca7-4ed3-b773-11f400def123)
- Universal media input handler for WebAI workflows
- Real-time frame rate control
- OpenCV integration with WebAI Frame format
- Support for video files and image directories
- Stay-alive capability for live processingElement ID: llama4 (UUID: e54b5bf8-f954-4dba-a111-c45728c46e8e)
- Advanced multimodal AI element
- Smart batching system (max 8 attachments per message)
- Context management across conversations
- Both Llama-4-Maverick and Llama-4-Scout support
- Memory-efficient chat history managementElement ID: api (UUID: 68f81646-53de-4952-b171-6ee7cdbd9fb0)
- OpenAI-compatible API server element
- Complete /v1/chat/completions implementation
- Performance metrics (TPS, TTFT) tracking
- Queue management for concurrent requests
- CORS support for web integrationWebAI Navigator Workflow:
Media Loader → Custom Video Processor → Llama 4 → API Server
↓ ↓ ↓ ↓
Load videos → Extract+Transcribe → AI Analysis → External Access
- 🔒 Data Privacy: All processing stays within your WebAI environment
- 💰 Cost Control: No unpredictable cloud bills - use existing infrastructure
- 🎛️ Full Control: Own your models, data, and processing pipeline
- ⚡ Performance: Optimized for local processing with WebAI's capabilities
# Advanced WebAI patterns implemented:
- Context[Inputs, Outputs, Settings] typing
- @element.executor decorators
- ElementSettings with validation
- ElementOutputs with structured data
- Async generators for streaming
- Frame object manipulation
- Color space handling- Settings Validation: Type-safe settings with hints and validation
- Error Handling: Comprehensive logging and exception management
- Performance Optimization: Efficient batching and memory management
- Configurability: Full UI configuration in WebAI Navigator
- Documentation: Complete setup instructions and usage examples
Our WebAI implementation enables:
Meta Glasses Video Input
↓
WebAI Media Loader (universal input handling)
↓
WebAI Video Processor (specialized conversation processing)
↓
WebAI Llama Integration (multimodal AI with memory)
↓
WebAI API Server (OpenAI-compatible access)
↓
Natural Language Memory Queries
- Local Processing: Meta glasses footage never leaves your environment
- Scalable Architecture: Handle unlimited daily footage privately
- Enterprise Ready: Deploy in corporate environments with full data control
- Extensible Platform: Easy to add new capabilities via WebAI elements
Each element is production-ready for the WebAI ecosystem:
- ✅ Complete Documentation: Setup, usage, and integration guides
- ✅ Proper Packaging: setup.py, requirements.txt, element configs
- ✅ Type Safety: Full type hints and WebAI SDK compliance
- ✅ Error Handling: Robust error management and logging
- ✅ Performance: Optimized for production workloads
- ✅ Configurability: Rich settings UI in WebAI Navigator
Transform your Meta Ray-Ban glasses into an intelligent memory assistant that:
- 📹 Captures your daily conversations and experiences
- 🧠 Remembers every detail through multimodal AI processing
- 💬 Recalls insights when you ask "What did we discuss about...?"
- 🔍 Searches through your day using natural language queries
Meta Glasses Video → Manual Upload → Preprocess → Llama Memory → Ask Anything
↓ ↓ ↓ ↓ ↓
Daily footage → File Transfer → Audio+Visual → Store context → Get insights
- 📱 Input: Upload videos from Meta glasses (manual for hackathon)
- ⚡ Process: Auto-transcribe conversations + extract key visual moments
- 🧠 Store: Feed everything to Llama for contextual understanding
- 🗣️ Query: Ask natural questions about your day/conversations
- 💡 Insights: Get detailed responses with full context
- Meta Ray-Ban smart glasses (for video capture)
- Python 3.8+
- Llama 4 API access
- Videos captured from Meta glasses
- Install the system:
git clone <repository-url>
cd memory
pip install -r requirements.txt
cd custom-video-processor && pip install -e .- Transfer Meta glasses videos:
# Create directory for glasses footage
mkdir glasses-videos
# Copy videos from Meta glasses (via USB/app export)
cp /path/to/meta-glasses-exports/*.mp4 glasses-videos/- Configure for processing:
# Set your video directory
export GLASSES_VIDEO_PATH="./glasses-videos"
# Set your Llama API key
export LLAMA_API_KEY="your-llama-4-api-key"- Process your glasses footage:
# Process all videos in the directory
python -m custom_video_processor --video-directory ./glasses-videos- Start asking questions:
curl -X POST http://localhost:8000/v1/chat/completions \
-d '{"messages": [{"role": "user", "content": "What conversations did I have today?"}]}'Meta Glasses captured a 15-minute coffee conversation about a new project.
WebAI Processing Flow:
1. Media Loader: Ingests glasses video file
2. Video Processor: Extracts 3 conversation frames + full transcript
3. Llama Element: Stores multimodal context with conversation understanding
4. API Element: Provides query interface
Query Demo:
You: "What did Sarah and I discuss about project timelines?"
Memory Assistant: "You and Sarah discussed a new project timeline during your coffee meeting. Sarah expressed concern about the Q4 deadline being too aggressive, suggesting a 2-week buffer for testing. You agreed to present a revised timeline to stakeholders by Friday. She also mentioned her team's availability might be limited in December due to holiday schedules."
You: "What exactly did Sarah say about the December availability?"
Memory Assistant: "Sarah specifically mentioned that her team's availability might be limited in December due to holiday schedules. She seemed concerned about the impact on the project timeline and suggested planning critical work to complete before the holiday season."
You: "Has anyone else mentioned timeline concerns recently?"
Memory Assistant: "Yes! In your Monday team meeting, Mike also expressed concerns about aggressive timelines, specifically mentioning that the Q3 project overran by two weeks. This seems to be a recurring theme in your recent project discussions."
{
"media_type": "video_transcription",
"transcript": "So the key challenge with this project is the Q4 timeline...",
"source_file": "/path/to/glasses_video.mp4",
"video_name": "coffee_meeting_sarah.mp4",
"video_index": 0,
"total_videos": 3,
"whisper_model": "base"
}{
"media_type": "video_frame",
"frame_data": {
"timestamp": 180.0,
"image_base64": "iVBORw0KGgoAAAANSUhEUgAA...",
"frame_number": 5400,
"size": {"width": 1088, "height": 1088}
},
"source_file": "/path/to/glasses_video.mp4",
"video_name": "coffee_meeting_sarah.mp4",
"frame_index": 9,
"total_frames": 15
}- Video Directory: Batch process entire directories of glasses footage
- Whisper Model: tiny (fastest) → large (most accurate) for conversation transcription
- Frame Interval: Seconds between extracted frames (optimized for conversations)
- Output Frame Rate: Processing speed control
- Stay Alive: Continuous processing mode for live workflows
- Model Selection: Maverick (balanced) vs Scout (specialized) for conversation understanding
- Temperature: Creativity control (0.0-1.0) for response generation
- Max Tokens: Response length limits
- Chat History: Context retention toggle for cross-conversation memory
- Concurrent Requests: Parallel processing limits for multiple queries
- Queue Size: Request buffering capacity
- Timeout: Response time limits
- Authentication: API key requirements for secure access
- Ray-Ban Meta Glasses: Direct video capture from your POV
- Phone Upload: Manual video uploads when glasses aren't available
- Batch Import: Process entire days/weeks of footage at once
- Morning: Upload yesterday's glasses footage
- Processing: System auto-processes while you work (5-10 min for hours of video)
- Throughout Day: Ask questions about previous conversations
- Evening: Review insights and key moments from your day
- "What action items came out of the client call?"
- "Did we agree on the Q4 budget numbers?"
- "What was Sarah's concern about the timeline?"
- "What were the key points from the AI presentation?"
- "Who mentioned the new framework I should research?"
- "What networking contacts did I make today?"
- "What restaurant did Alex recommend?"
- "What was that book recommendation from coffee chat?"
- "When is my friend's birthday party again?"
- 🔄 Flow-Based Processing: Visual workflow design in Navigator
- ⚙️ Configurable Settings: No code changes for different use cases
- 📊 Real-time Monitoring: Built-in performance tracking
- 🔧 Easy Deployment: One-click deployment in WebAI environments
- 🎯 Focused Elements: Each component has a single, well-defined purpose
- Video Processing: Handles Meta glasses MP4 format via WebAI
- Audio Transcription: Whisper models (tiny → large) in WebAI element
- Frame Extraction: Smart sampling during conversations
- Llama Integration: Multimodal context storage in WebAI ecosystem
- Query Interface: OpenAI-compatible API through WebAI element
- ✅ File Format: MP4 videos from Meta glasses
- ✅ Audio Quality: Optimized for conversation transcription
- ✅ Video Resolution: Handles 1088x1088 glasses format
- ✅ Duration: Processes videos of any length
- 🔄 Direct Integration: Planned for post-hackathon
- Complete WebAI ecosystem for Meta glasses video processing
- 4 production-ready WebAI elements with full SDK implementation
- Advanced conversation understanding with Llama 4 in private environment
- Natural language memory queries through WebAI API element
- Proof-of-concept for private augmented memory
- First Meta glasses memory system built entirely on WebAI Navigator
- Private multimodal AI pipeline showcasing WebAI's privacy-first approach
- Production-ready elements ready for WebAI marketplace
- Complete ecosystem demonstrating WebAI's capability for complex AI workflows
Even with manual transfer, users get:
- Private processing of sensitive conversation data
- Enterprise-grade video analysis without cloud dependencies
- Configurable workflows through WebAI Navigator interface
- Scalable architecture for growing usage
memory/
├── README.md # This file
├── requirements.txt # Base dependencies
├── .gitignore # Git ignore rules
│
├── custom-video-processor/ # 🎯 Main video processing element
│ ├── custom_video_processor/ # Core processing logic
│ ├── requirements.txt # Specific dependencies
│ ├── setup.py # Package installation
│ └── README.md # Detailed element docs
│
├── medialoader__init__.py # 📹 Media loading element
├── llama__init__.py # 🧠 Llama 4 AI integration
├── api__init__.py # 🌐 API server element
├── element_config.yaml # Legacy WebAI configuration
│
└── transcripts/ # 📝 Example outputs
└── (transcription files)
Each element is modular and can be extended independently:
- Video Processor: Add new media formats or processing algorithms for different glasses
- Llama Integration: Implement additional AI models or providers
- API Server: Add new endpoints or authentication methods
- Media Loader: Support additional input sources
- Transfer a few Meta glasses videos to the system
- Watch the AI process conversations and visual context
- Ask natural language questions about the content
- See how it connects conversations across multiple videos
The system is designed for easy extension:
- Add new input sources (other smart glasses, phones, etc.)
- Enhance conversation understanding
- Build custom query interfaces
- Integrate with other Meta platforms
This project is part of the WebAI ecosystem and follows WebAI licensing terms.
