We wanted to build something that feels human. Music is deeply emotional, yet most apps treat it as data: playlists, algorithms, and charts. We thought: what if your computer could actually feel your vibe and create music to match it? That’s where Angelus was born — an AI that reads your expressions and body rhythm to generate music that matches your emotional state in real time.
Angelus uses your webcam to understand how you feel and move. It detects facial expressions (happy, sad, stressed, excited), tracks head movement to estimate BPM (beats per minute), and recognizes hand gestures for playback control. After analyzing 10 seconds of biometric data, it generates custom music through AI that perfectly matches your emotional state and energy level. The system integrates with your Spotify profile to personalize genre selection based on your listening history.
Frontend: Built with React, TypeScript, and Vite, featuring real-time WebRTC video streaming, Spotify OAuth integration via @spotify/web-api-ts-sdk, and dynamic audio visualization.
Backend: Python Flask server with WebRTC (aiortc) for low-latency video processing. We use MediaPipe Face Mesh (468 3D facial landmarks) for emotion detection and head-motion BPM estimation via velocity-based signal processing with Welch PSD and autocorrelation analysis. Emotion classification runs through Roboflow’s inference server with EMA smoothing and hysteresis filtering for stable predictions.
AI Pipeline: Collected biometric data (emotion distribution, average BPM) feeds into Claude 3.5 Haiku via the Anthropic API, which generates creative, contextually-aware music prompts. These prompts are sent to either ElevenLabs Music Generation API or Suno AI to synthesize 30-second audio tracks.
Key Technologies: OpenCV, NumPy, SciPy (signal processing), asyncio (concurrent video processing), Tailwind CSS, and hand gesture detection via MediaPipe Hands for palm-based controls.
Architecting WebRTC streaming with Flask’s synchronous nature required running asyncio event loops in separate threads with custom exception handlers for connection stability. Implementing robust BPM detection from subtle head movements using Savitzky-Golay filtering, band-pass filtering (0.6–3.2 Hz), and Kalman filtering to eliminate noise while maintaining responsiveness. Preventing emotion classification jitter through exponential moving averages (EMA) and hysteresis thresholds to avoid rapid label switching. Handling unofficial Suno API rate limits and session management, requiring fallback logic to ElevenLabs with proper error handling for CAPTCHA and connection resets. Synchronizing 10-second data collection windows across async video frames while maintaining real-time UI feedback. Accomplishments that we're proud of Built a full camera-to-music pipeline with multi-modal biometric fusion (facial emotion + head BPM + hand gestures) in under 36 hours. Achieved sub-100ms latency for facial landmark extraction, processing 468 3D points per frame at 30 FPS. Implemented production-grade signal processing algorithms including Welch periodograms, normalized autocorrelation, and scalar Kalman filters for BPM smoothing. Created an intelligent prompt generation system that considers a user’s Spotify preferences, emotional context, and physical energy to craft coherent music descriptions for generative AI. Designed a cohesive brand identity where technology feels invisible — Angelus feels futuristic but human. What we learned We discovered that building empathetic AI requires balancing responsiveness with stability — raw sensor data is noisy, so we layered multiple filtering techniques (EMA, Kalman, hysteresis) to extract meaningful patterns without lag. We gained deep expertise in WebRTC architecture, learning to bridge async Python video processing with synchronous Flask endpoints using asyncio.run_coroutine_threadsafe(). On the AI side, we learned that prompt engineering is crucial — Claude’s ability to transform raw biometric statistics into creative, musically coherent descriptions made the difference between generic and emotionally resonant output. We also learned that good UX isn’t about showing all the data — it’s about showing the right data at the right time.
We’re working on:
Continuous emotion adaptation: Using streaming music APIs to modulate tempo, key, and intensity mid-song based on real-time biometric feedback. Mobile application: React Native version using the phone’s front camera with on-device TensorFlow Lite models for privacy-preserving emotion detection. Expanded biometric inputs: Eye tracking for attention detection, micro-expression analysis for stress/anxiety states, and posture detection for energy levels. Social features: Shareable “mood snapshots” showing your emotional journey and generated soundtrack. Mood Marketplace: Platform where artists can upload emotion-tagged tracks that Angelus intelligently blends with generated content.