Skip to content

🏆 Winner of Meta Llama WebAI Track Hackathon - A Llama-powered personal memory system that processes Meta Ray-Ban glasses footage to help you recall conversations, insights, and daily moments through intelligent video analysis

Notifications You must be signed in to change notification settings

yizucodes/memory

Repository files navigation

Meta Glasses Memory Assistant 🕶️

A Llama-powered personal memory system that processes Meta Ray-Ban glasses footage to help you recall conversations, insights, and daily moments through intelligent video analysis.

Built for the Llama Hackathon - Your AI companion for augmented memory

🎥 Video Demo

Watch the video

⚡ Hackathon Status & Integration

🚀 What's Working (24-hour MVP)

  • Complete video processing pipeline for Meta glasses footage
  • Advanced transcription with OpenAI Whisper (5 models)
  • Smart frame extraction optimized for conversations
  • Llama 4 integration with multimodal understanding
  • Natural language querying via API
  • Batch processing for multiple videos
  • WebAI Navigator elements

🔄 Current Integration Status

Due to the 24-hour hackathon constraint, we have a working MVP with manual video transfer:

Current Workflow:

  1. 📹 Capture: Videos recorded with Meta Ray-Ban glasses
  2. 📱 Transfer: Manual upload to processing system (Google Drive)
  3. ⚡ Process: Automatic transcription + frame extraction
  4. 🧠 Analyze: Llama processes and stores all context
  5. 💬 Query: Ask questions about your conversations

🎯 Next Steps for Full Integration (Post-Hackathon)

  • 📲 Direct API integration with Meta glasses companion app
  • 🔄 Real-time processing as videos are captured
  • ☁️ Seamless sync without manual file transfer
  • 📱 Mobile app for instant querying on-the-go

💡 Hackathon Demo Flow

Meta Glasses Recording → Manual Upload → Our Pipeline → Intelligent Memory
         ↓                     ↓              ↓              ↓
    "Conversation"    →    video.mp4    →   AI Analysis  →  "What did we discuss?"

🏆 WebAI Navigator Accomplishments

🌟 Private AI Excellence

Built as a complete WebAI Navigator ecosystem showcasing private AI capabilities:

  • 4 Production-Ready Elements: Complete pipeline implemented as WebAI elements
  • Private Processing: All video analysis happens locally - no data leaves your infrastructure
  • Enterprise-Ready: Built with WebAI Element SDK for production deployment
  • Modular Architecture: Each component can be used independently or together

🛠️ WebAI Technical Achievements

1. Custom Video Processor Element

Element ID: custom-video-processor
- Advanced WebAI Element SDK implementation
- Configurable UI settings for batch processing
- Structured Frame output for WebAI ecosystem
- Stay-alive mode for continuous processing
- Full setup.py and requirements for production deployment

2. Media Loader Element

Element ID: media_loader (UUID: 1916c9ba-fca7-4ed3-b773-11f400def123)
- Universal media input handler for WebAI workflows
- Real-time frame rate control
- OpenCV integration with WebAI Frame format
- Support for video files and image directories
- Stay-alive capability for live processing

3. Llama 4 Integration Element

Element ID: llama4 (UUID: e54b5bf8-f954-4dba-a111-c45728c46e8e)
- Advanced multimodal AI element
- Smart batching system (max 8 attachments per message)
- Context management across conversations
- Both Llama-4-Maverick and Llama-4-Scout support
- Memory-efficient chat history management

4. API Server Element

Element ID: api (UUID: 68f81646-53de-4952-b171-6ee7cdbd9fb0)
- OpenAI-compatible API server element
- Complete /v1/chat/completions implementation
- Performance metrics (TPS, TTFT) tracking
- Queue management for concurrent requests
- CORS support for web integration

🔗 WebAI Ecosystem Integration

Complete Pipeline Flow

WebAI Navigator Workflow:
Media Loader → Custom Video Processor → Llama 4 → API Server
      ↓              ↓                    ↓         ↓
  Load videos → Extract+Transcribe → AI Analysis → External Access

Private AI Benefits

  • 🔒 Data Privacy: All processing stays within your WebAI environment
  • 💰 Cost Control: No unpredictable cloud bills - use existing infrastructure
  • 🎛️ Full Control: Own your models, data, and processing pipeline
  • ⚡ Performance: Optimized for local processing with WebAI's capabilities

📊 WebAI Implementation Highlights

Element SDK Mastery

# Advanced WebAI patterns implemented:
- Context[Inputs, Outputs, Settings] typing
- @element.executor decorators
- ElementSettings with validation
- ElementOutputs with structured data
- Async generators for streaming
- Frame object manipulation
- Color space handling

Production-Ready Features

  • Settings Validation: Type-safe settings with hints and validation
  • Error Handling: Comprehensive logging and exception management
  • Performance Optimization: Efficient batching and memory management
  • Configurability: Full UI configuration in WebAI Navigator
  • Documentation: Complete setup instructions and usage examples

🎯 WebAI Innovation for Meta Glasses

Multimodal Processing Pipeline

Our WebAI implementation enables:

Meta Glasses Video Input
         ↓
WebAI Media Loader (universal input handling)
         ↓  
WebAI Video Processor (specialized conversation processing)
         ↓
WebAI Llama Integration (multimodal AI with memory)
         ↓
WebAI API Server (OpenAI-compatible access)
         ↓
Natural Language Memory Queries

Private Memory Assistant

  • Local Processing: Meta glasses footage never leaves your environment
  • Scalable Architecture: Handle unlimited daily footage privately
  • Enterprise Ready: Deploy in corporate environments with full data control
  • Extensible Platform: Easy to add new capabilities via WebAI elements

💡 WebAI Element Marketplace Ready

Each element is production-ready for the WebAI ecosystem:

  • ✅ Complete Documentation: Setup, usage, and integration guides
  • ✅ Proper Packaging: setup.py, requirements.txt, element configs
  • ✅ Type Safety: Full type hints and WebAI SDK compliance
  • ✅ Error Handling: Robust error management and logging
  • ✅ Performance: Optimized for production workloads
  • ✅ Configurability: Rich settings UI in WebAI Navigator

🎯 The Vision

Transform your Meta Ray-Ban glasses into an intelligent memory assistant that:

  • 📹 Captures your daily conversations and experiences
  • 🧠 Remembers every detail through multimodal AI processing
  • 💬 Recalls insights when you ask "What did we discuss about...?"
  • 🔍 Searches through your day using natural language queries

🏗️ How It Works

Meta Glasses Video → Manual Upload → Preprocess → Llama Memory → Ask Anything
       ↓                  ↓            ↓           ↓            ↓
  Daily footage → File Transfer → Audio+Visual → Store context → Get insights

The Complete Memory Pipeline:

  1. 📱 Input: Upload videos from Meta glasses (manual for hackathon)
  2. ⚡ Process: Auto-transcribe conversations + extract key visual moments
  3. 🧠 Store: Feed everything to Llama for contextual understanding
  4. 🗣️ Query: Ask natural questions about your day/conversations
  5. 💡 Insights: Get detailed responses with full context

🚀 Hackathon Demo Setup

Prerequisites

  • Meta Ray-Ban smart glasses (for video capture)
  • Python 3.8+
  • Llama 4 API access
  • Videos captured from Meta glasses

Quick Demo Setup

  1. Install the system:
git clone <repository-url>
cd memory
pip install -r requirements.txt
cd custom-video-processor && pip install -e .
  1. Transfer Meta glasses videos:
# Create directory for glasses footage
mkdir glasses-videos
# Copy videos from Meta glasses (via USB/app export)
cp /path/to/meta-glasses-exports/*.mp4 glasses-videos/
  1. Configure for processing:
# Set your video directory
export GLASSES_VIDEO_PATH="./glasses-videos"

# Set your Llama API key
export LLAMA_API_KEY="your-llama-4-api-key"
  1. Process your glasses footage:
# Process all videos in the directory
python -m custom_video_processor --video-directory ./glasses-videos
  1. Start asking questions:
curl -X POST http://localhost:8000/v1/chat/completions \
  -d '{"messages": [{"role": "user", "content": "What conversations did I have today?"}]}'

🎮 Live Demo Examples

Demo Scenario: Coffee Meeting

Meta Glasses captured a 15-minute coffee conversation about a new project.

WebAI Processing Flow:

1. Media Loader: Ingests glasses video file
2. Video Processor: Extracts 3 conversation frames + full transcript
3. Llama Element: Stores multimodal context with conversation understanding
4. API Element: Provides query interface

Query Demo:

You: "What did Sarah and I discuss about project timelines?"

Memory Assistant: "You and Sarah discussed a new project timeline during your coffee meeting. Sarah expressed concern about the Q4 deadline being too aggressive, suggesting a 2-week buffer for testing. You agreed to present a revised timeline to stakeholders by Friday. She also mentioned her team's availability might be limited in December due to holiday schedules."

Follow-Up Questions

You: "What exactly did Sarah say about the December availability?"

Memory Assistant: "Sarah specifically mentioned that her team's availability might be limited in December due to holiday schedules. She seemed concerned about the impact on the project timeline and suggested planning critical work to complete before the holiday season."

Cross-Video Context

You: "Has anyone else mentioned timeline concerns recently?"

Memory Assistant: "Yes! In your Monday team meeting, Mike also expressed concerns about aggressive timelines, specifically mentioning that the Q3 project overran by two weeks. This seems to be a recurring theme in your recent project discussions."

📊 Output Formats

Video Transcription

{
  "media_type": "video_transcription",
  "transcript": "So the key challenge with this project is the Q4 timeline...",
  "source_file": "/path/to/glasses_video.mp4",
  "video_name": "coffee_meeting_sarah.mp4",
  "video_index": 0,
  "total_videos": 3,
  "whisper_model": "base"
}

Video Frames

{
  "media_type": "video_frame",
  "frame_data": {
    "timestamp": 180.0,
    "image_base64": "iVBORw0KGgoAAAANSUhEUgAA...",
    "frame_number": 5400,
    "size": {"width": 1088, "height": 1088}
  },
  "source_file": "/path/to/glasses_video.mp4",
  "video_name": "coffee_meeting_sarah.mp4",
  "frame_index": 9,
  "total_frames": 15
}

🔧 Configuration Options

Custom Video Processor Settings

  • Video Directory: Batch process entire directories of glasses footage
  • Whisper Model: tiny (fastest) → large (most accurate) for conversation transcription
  • Frame Interval: Seconds between extracted frames (optimized for conversations)
  • Output Frame Rate: Processing speed control
  • Stay Alive: Continuous processing mode for live workflows

Llama 4 Settings

  • Model Selection: Maverick (balanced) vs Scout (specialized) for conversation understanding
  • Temperature: Creativity control (0.0-1.0) for response generation
  • Max Tokens: Response length limits
  • Chat History: Context retention toggle for cross-conversation memory

API Server Settings

  • Concurrent Requests: Parallel processing limits for multiple queries
  • Queue Size: Request buffering capacity
  • Timeout: Response time limits
  • Authentication: API key requirements for secure access

🎯 Perfect For Meta Glasses Users

📱 Input Sources

  • Ray-Ban Meta Glasses: Direct video capture from your POV
  • Phone Upload: Manual video uploads when glasses aren't available
  • Batch Import: Process entire days/weeks of footage at once

🔄 Daily Workflow

  1. Morning: Upload yesterday's glasses footage
  2. Processing: System auto-processes while you work (5-10 min for hours of video)
  3. Throughout Day: Ask questions about previous conversations
  4. Evening: Review insights and key moments from your day

💡 Real Use Cases

Business Meetings

  • "What action items came out of the client call?"
  • "Did we agree on the Q4 budget numbers?"
  • "What was Sarah's concern about the timeline?"

Learning & Conferences

  • "What were the key points from the AI presentation?"
  • "Who mentioned the new framework I should research?"
  • "What networking contacts did I make today?"

Personal Conversations

  • "What restaurant did Alex recommend?"
  • "What was that book recommendation from coffee chat?"
  • "When is my friend's birthday party again?"

🛠️ Technical Implementation

WebAI Architecture Benefits

  • 🔄 Flow-Based Processing: Visual workflow design in Navigator
  • ⚙️ Configurable Settings: No code changes for different use cases
  • 📊 Real-time Monitoring: Built-in performance tracking
  • 🔧 Easy Deployment: One-click deployment in WebAI environments
  • 🎯 Focused Elements: Each component has a single, well-defined purpose

Core Pipeline (Fully Implemented)

  • Video Processing: Handles Meta glasses MP4 format via WebAI
  • Audio Transcription: Whisper models (tiny → large) in WebAI element
  • Frame Extraction: Smart sampling during conversations
  • Llama Integration: Multimodal context storage in WebAI ecosystem
  • Query Interface: OpenAI-compatible API through WebAI element

Meta Glasses Compatibility

  • File Format: MP4 videos from Meta glasses
  • Audio Quality: Optimized for conversation transcription
  • Video Resolution: Handles 1088x1088 glasses format
  • Duration: Processes videos of any length
  • 🔄 Direct Integration: Planned for post-hackathon

🏆 Hackathon Achievement

🌟 What We Built in 24 Hours

  • Complete WebAI ecosystem for Meta glasses video processing
  • 4 production-ready WebAI elements with full SDK implementation
  • Advanced conversation understanding with Llama 4 in private environment
  • Natural language memory queries through WebAI API element
  • Proof-of-concept for private augmented memory

🎯 WebAI Innovation

  • First Meta glasses memory system built entirely on WebAI Navigator
  • Private multimodal AI pipeline showcasing WebAI's privacy-first approach
  • Production-ready elements ready for WebAI marketplace
  • Complete ecosystem demonstrating WebAI's capability for complex AI workflows

📈 Immediate Value

Even with manual transfer, users get:

  • Private processing of sensitive conversation data
  • Enterprise-grade video analysis without cloud dependencies
  • Configurable workflows through WebAI Navigator interface
  • Scalable architecture for growing usage

📁 Repository Structure

memory/
├── README.md                          # This file
├── requirements.txt                   # Base dependencies
├── .gitignore                        # Git ignore rules
│
├── custom-video-processor/            # 🎯 Main video processing element
│   ├── custom_video_processor/        # Core processing logic
│   ├── requirements.txt              # Specific dependencies
│   ├── setup.py                      # Package installation
│   └── README.md                     # Detailed element docs
│
├── medialoader__init__.py             # 📹 Media loading element
├── llama__init__.py                   # 🧠 Llama 4 AI integration
├── api__init__.py                     # 🌐 API server element
├── element_config.yaml               # Legacy WebAI configuration
│
└── transcripts/                       # 📝 Example outputs
    └── (transcription files)

🛠️ Development

Adding New Features

Each element is modular and can be extended independently:

  • Video Processor: Add new media formats or processing algorithms for different glasses
  • Llama Integration: Implement additional AI models or providers
  • API Server: Add new endpoints or authentication methods
  • Media Loader: Support additional input sources

🤝 Try It Yourself

For Hackathon Judges

  1. Transfer a few Meta glasses videos to the system
  2. Watch the AI process conversations and visual context
  3. Ask natural language questions about the content
  4. See how it connects conversations across multiple videos

For Developers

The system is designed for easy extension:

  • Add new input sources (other smart glasses, phones, etc.)
  • Enhance conversation understanding
  • Build custom query interfaces
  • Integrate with other Meta platforms

📜 License

This project is part of the WebAI ecosystem and follows WebAI licensing terms.

About

🏆 Winner of Meta Llama WebAI Track Hackathon - A Llama-powered personal memory system that processes Meta Ray-Ban glasses footage to help you recall conversations, insights, and daily moments through intelligent video analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages