Meta Glasses Memory Assistant 🕶️

A Llama-powered personal memory system that processes Meta Ray-Ban glasses footage to help you recall conversations, insights, and daily moments through intelligent video analysis.

Built for the Llama Hackathon - Your AI companion for augmented memory

🎥 Video Demo

⚡ Hackathon Status & Integration

🚀 What's Working (24-hour MVP)

✅ Complete video processing pipeline for Meta glasses footage
✅ Advanced transcription with OpenAI Whisper (5 models)
✅ Smart frame extraction optimized for conversations
✅ Llama 4 integration with multimodal understanding
✅ Natural language querying via API
✅ Batch processing for multiple videos
✅ WebAI Navigator elements

🔄 Current Integration Status

Due to the 24-hour hackathon constraint, we have a working MVP with manual video transfer:

Current Workflow:

📹 Capture: Videos recorded with Meta Ray-Ban glasses
📱 Transfer: Manual upload to processing system (Google Drive)
⚡ Process: Automatic transcription + frame extraction
🧠 Analyze: Llama processes and stores all context
💬 Query: Ask questions about your conversations

🎯 Next Steps for Full Integration (Post-Hackathon)

📲 Direct API integration with Meta glasses companion app
🔄 Real-time processing as videos are captured
☁️ Seamless sync without manual file transfer
📱 Mobile app for instant querying on-the-go

💡 Hackathon Demo Flow

Meta Glasses Recording → Manual Upload → Our Pipeline → Intelligent Memory
         ↓                     ↓              ↓              ↓
    "Conversation"    →    video.mp4    →   AI Analysis  →  "What did we discuss?"

🏆 WebAI Navigator Accomplishments

🌟 Private AI Excellence

Built as a complete WebAI Navigator ecosystem showcasing private AI capabilities:

✅ 4 Production-Ready Elements: Complete pipeline implemented as WebAI elements
✅ Private Processing: All video analysis happens locally - no data leaves your infrastructure
✅ Enterprise-Ready: Built with WebAI Element SDK for production deployment
✅ Modular Architecture: Each component can be used independently or together

🛠️ WebAI Technical Achievements

1. Custom Video Processor Element

Element ID: custom-video-processor
- Advanced WebAI Element SDK implementation
- Configurable UI settings for batch processing
- Structured Frame output for WebAI ecosystem
- Stay-alive mode for continuous processing
- Full setup.py and requirements for production deployment

2. Media Loader Element

Element ID: media_loader (UUID: 1916c9ba-fca7-4ed3-b773-11f400def123)
- Universal media input handler for WebAI workflows
- Real-time frame rate control
- OpenCV integration with WebAI Frame format
- Support for video files and image directories
- Stay-alive capability for live processing

3. Llama 4 Integration Element

Element ID: llama4 (UUID: e54b5bf8-f954-4dba-a111-c45728c46e8e)
- Advanced multimodal AI element
- Smart batching system (max 8 attachments per message)
- Context management across conversations
- Both Llama-4-Maverick and Llama-4-Scout support
- Memory-efficient chat history management

4. API Server Element

Element ID: api (UUID: 68f81646-53de-4952-b171-6ee7cdbd9fb0)
- OpenAI-compatible API server element
- Complete /v1/chat/completions implementation
- Performance metrics (TPS, TTFT) tracking
- Queue management for concurrent requests
- CORS support for web integration

🔗 WebAI Ecosystem Integration

Complete Pipeline Flow

WebAI Navigator Workflow:
Media Loader → Custom Video Processor → Llama 4 → API Server
      ↓              ↓                    ↓         ↓
  Load videos → Extract+Transcribe → AI Analysis → External Access

Private AI Benefits

🔒 Data Privacy: All processing stays within your WebAI environment
💰 Cost Control: No unpredictable cloud bills - use existing infrastructure
🎛️ Full Control: Own your models, data, and processing pipeline
⚡ Performance: Optimized for local processing with WebAI's capabilities

📊 WebAI Implementation Highlights

Element SDK Mastery

# Advanced WebAI patterns implemented:
- Context[Inputs, Outputs, Settings] typing
- @element.executor decorators
- ElementSettings with validation
- ElementOutputs with structured data
- Async generators for streaming
- Frame object manipulation
- Color space handling

Production-Ready Features

Settings Validation: Type-safe settings with hints and validation
Error Handling: Comprehensive logging and exception management
Performance Optimization: Efficient batching and memory management
Configurability: Full UI configuration in WebAI Navigator
Documentation: Complete setup instructions and usage examples

🎯 WebAI Innovation for Meta Glasses

Multimodal Processing Pipeline

Our WebAI implementation enables:

Meta Glasses Video Input
         ↓
WebAI Media Loader (universal input handling)
         ↓  
WebAI Video Processor (specialized conversation processing)
         ↓
WebAI Llama Integration (multimodal AI with memory)
         ↓
WebAI API Server (OpenAI-compatible access)
         ↓
Natural Language Memory Queries

Private Memory Assistant

Local Processing: Meta glasses footage never leaves your environment
Scalable Architecture: Handle unlimited daily footage privately
Enterprise Ready: Deploy in corporate environments with full data control
Extensible Platform: Easy to add new capabilities via WebAI elements

💡 WebAI Element Marketplace Ready

Each element is production-ready for the WebAI ecosystem:

✅ Complete Documentation: Setup, usage, and integration guides
✅ Proper Packaging: setup.py, requirements.txt, element configs
✅ Type Safety: Full type hints and WebAI SDK compliance
✅ Error Handling: Robust error management and logging
✅ Performance: Optimized for production workloads
✅ Configurability: Rich settings UI in WebAI Navigator

🎯 The Vision

Transform your Meta Ray-Ban glasses into an intelligent memory assistant that:

📹 Captures your daily conversations and experiences
🧠 Remembers every detail through multimodal AI processing
💬 Recalls insights when you ask "What did we discuss about...?"
🔍 Searches through your day using natural language queries

🏗️ How It Works

Meta Glasses Video → Manual Upload → Preprocess → Llama Memory → Ask Anything
       ↓                  ↓            ↓           ↓            ↓
  Daily footage → File Transfer → Audio+Visual → Store context → Get insights

The Complete Memory Pipeline:

📱 Input: Upload videos from Meta glasses (manual for hackathon)
⚡ Process: Auto-transcribe conversations + extract key visual moments
🧠 Store: Feed everything to Llama for contextual understanding
🗣️ Query: Ask natural questions about your day/conversations
💡 Insights: Get detailed responses with full context

🚀 Hackathon Demo Setup

Prerequisites

Meta Ray-Ban smart glasses (for video capture)
Python 3.8+
Llama 4 API access
Videos captured from Meta glasses

Quick Demo Setup

Install the system:

git clone <repository-url>
cd memory
pip install -r requirements.txt
cd custom-video-processor && pip install -e .

Transfer Meta glasses videos:

# Create directory for glasses footage
mkdir glasses-videos
# Copy videos from Meta glasses (via USB/app export)
cp /path/to/meta-glasses-exports/*.mp4 glasses-videos/

Configure for processing:

# Set your video directory
export GLASSES_VIDEO_PATH="./glasses-videos"

# Set your Llama API key
export LLAMA_API_KEY="your-llama-4-api-key"

Process your glasses footage:

# Process all videos in the directory
python -m custom_video_processor --video-directory ./glasses-videos

Start asking questions:

curl -X POST http://localhost:8000/v1/chat/completions \
  -d '{"messages": [{"role": "user", "content": "What conversations did I have today?"}]}'

🎮 Live Demo Examples

Demo Scenario: Coffee Meeting

Meta Glasses captured a 15-minute coffee conversation about a new project.

WebAI Processing Flow:

1. Media Loader: Ingests glasses video file
2. Video Processor: Extracts 3 conversation frames + full transcript
3. Llama Element: Stores multimodal context with conversation understanding
4. API Element: Provides query interface

Query Demo:

You: "What did Sarah and I discuss about project timelines?"

Memory Assistant: "You and Sarah discussed a new project timeline during your coffee meeting. Sarah expressed concern about the Q4 deadline being too aggressive, suggesting a 2-week buffer for testing. You agreed to present a revised timeline to stakeholders by Friday. She also mentioned her team's availability might be limited in December due to holiday schedules."

Follow-Up Questions

You: "What exactly did Sarah say about the December availability?"

Memory Assistant: "Sarah specifically mentioned that her team's availability might be limited in December due to holiday schedules. She seemed concerned about the impact on the project timeline and suggested planning critical work to complete before the holiday season."

Cross-Video Context

You: "Has anyone else mentioned timeline concerns recently?"

Memory Assistant: "Yes! In your Monday team meeting, Mike also expressed concerns about aggressive timelines, specifically mentioning that the Q3 project overran by two weeks. This seems to be a recurring theme in your recent project discussions."

📊 Output Formats

Video Transcription

{
  "media_type": "video_transcription",
  "transcript": "So the key challenge with this project is the Q4 timeline...",
  "source_file": "/path/to/glasses_video.mp4",
  "video_name": "coffee_meeting_sarah.mp4",
  "video_index": 0,
  "total_videos": 3,
  "whisper_model": "base"
}

Video Frames

{
  "media_type": "video_frame",
  "frame_data": {
    "timestamp": 180.0,
    "image_base64": "iVBORw0KGgoAAAANSUhEUgAA...",
    "frame_number": 5400,
    "size": {"width": 1088, "height": 1088}
  },
  "source_file": "/path/to/glasses_video.mp4",
  "video_name": "coffee_meeting_sarah.mp4",
  "frame_index": 9,
  "total_frames": 15
}

🔧 Configuration Options

Custom Video Processor Settings

Video Directory: Batch process entire directories of glasses footage
Whisper Model: tiny (fastest) → large (most accurate) for conversation transcription
Frame Interval: Seconds between extracted frames (optimized for conversations)
Output Frame Rate: Processing speed control
Stay Alive: Continuous processing mode for live workflows

Llama 4 Settings

Model Selection: Maverick (balanced) vs Scout (specialized) for conversation understanding
Temperature: Creativity control (0.0-1.0) for response generation
Max Tokens: Response length limits
Chat History: Context retention toggle for cross-conversation memory

API Server Settings

Concurrent Requests: Parallel processing limits for multiple queries
Queue Size: Request buffering capacity
Timeout: Response time limits
Authentication: API key requirements for secure access

🎯 Perfect For Meta Glasses Users

📱 Input Sources

Ray-Ban Meta Glasses: Direct video capture from your POV
Phone Upload: Manual video uploads when glasses aren't available
Batch Import: Process entire days/weeks of footage at once

🔄 Daily Workflow

Morning: Upload yesterday's glasses footage
Processing: System auto-processes while you work (5-10 min for hours of video)
Throughout Day: Ask questions about previous conversations
Evening: Review insights and key moments from your day

💡 Real Use Cases

Business Meetings

"What action items came out of the client call?"
"Did we agree on the Q4 budget numbers?"
"What was Sarah's concern about the timeline?"

Learning & Conferences

"What were the key points from the AI presentation?"
"Who mentioned the new framework I should research?"
"What networking contacts did I make today?"

Personal Conversations

"What restaurant did Alex recommend?"
"What was that book recommendation from coffee chat?"
"When is my friend's birthday party again?"

🛠️ Technical Implementation

WebAI Architecture Benefits

🔄 Flow-Based Processing: Visual workflow design in Navigator
⚙️ Configurable Settings: No code changes for different use cases
📊 Real-time Monitoring: Built-in performance tracking
🔧 Easy Deployment: One-click deployment in WebAI environments
🎯 Focused Elements: Each component has a single, well-defined purpose

Core Pipeline (Fully Implemented)

Video Processing: Handles Meta glasses MP4 format via WebAI
Audio Transcription: Whisper models (tiny → large) in WebAI element
Frame Extraction: Smart sampling during conversations
Llama Integration: Multimodal context storage in WebAI ecosystem
Query Interface: OpenAI-compatible API through WebAI element

Meta Glasses Compatibility

✅ File Format: MP4 videos from Meta glasses
✅ Audio Quality: Optimized for conversation transcription
✅ Video Resolution: Handles 1088x1088 glasses format
✅ Duration: Processes videos of any length
🔄 Direct Integration: Planned for post-hackathon

🏆 Hackathon Achievement

🌟 What We Built in 24 Hours

Complete WebAI ecosystem for Meta glasses video processing
4 production-ready WebAI elements with full SDK implementation
Advanced conversation understanding with Llama 4 in private environment
Natural language memory queries through WebAI API element
Proof-of-concept for private augmented memory

🎯 WebAI Innovation

First Meta glasses memory system built entirely on WebAI Navigator
Private multimodal AI pipeline showcasing WebAI's privacy-first approach
Production-ready elements ready for WebAI marketplace
Complete ecosystem demonstrating WebAI's capability for complex AI workflows

📈 Immediate Value

Even with manual transfer, users get:

Private processing of sensitive conversation data
Enterprise-grade video analysis without cloud dependencies
Configurable workflows through WebAI Navigator interface
Scalable architecture for growing usage

📁 Repository Structure

memory/
├── README.md                          # This file
├── requirements.txt                   # Base dependencies
├── .gitignore                        # Git ignore rules
│
├── custom-video-processor/            # 🎯 Main video processing element
│   ├── custom_video_processor/        # Core processing logic
│   ├── requirements.txt              # Specific dependencies
│   ├── setup.py                      # Package installation
│   └── README.md                     # Detailed element docs
│
├── medialoader__init__.py             # 📹 Media loading element
├── llama__init__.py                   # 🧠 Llama 4 AI integration
├── api__init__.py                     # 🌐 API server element
├── element_config.yaml               # Legacy WebAI configuration
│
└── transcripts/                       # 📝 Example outputs
    └── (transcription files)

🛠️ Development

Adding New Features

Each element is modular and can be extended independently:

Video Processor: Add new media formats or processing algorithms for different glasses
Llama Integration: Implement additional AI models or providers
API Server: Add new endpoints or authentication methods
Media Loader: Support additional input sources

🤝 Try It Yourself

For Hackathon Judges

Transfer a few Meta glasses videos to the system
Watch the AI process conversations and visual context
Ask natural language questions about the content
See how it connects conversations across multiple videos

For Developers

The system is designed for easy extension:

Add new input sources (other smart glasses, phones, etc.)
Enhance conversation understanding
Build custom query interfaces
Integrate with other Meta platforms

📜 License

This project is part of the WebAI ecosystem and follows WebAI licensing terms.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
custom-video-processor		custom-video-processor
transcripts		transcripts
.gitignore		.gitignore
README.md		README.md
api__init__.py		api__init__.py
element_config.yaml		element_config.yaml
llama__init__.py		llama__init__.py
medialoader__init__.py		medialoader__init__.py
requirements.txt		requirements.txt

yizucodes/memory

Folders and files

Latest commit

History

Repository files navigation