daily.dairy.demo-mini.mp4
Daily Diary lets users create short daily videos just by talking. Before bed, you tell the AI about your day — it listens, understands the story, finds related photos and videos, adds narration and music, and produces a short “memory movie.” It captures not only what happened, but how it felt.
It is an intelligent photo memory assistant that combines real-time voice conversation with visual analysis. Upload photos and engage in meaningful conversations about your memories while the AI analyzes your images and asks thoughtful questions to help you reflect on your experiences.
I took photos at the hackathon and made this video using Daily diary
vid_f0a24376-60af-4d98-833e-858d552aeaad.mp4
Gemini Integration:
- Gemini 2.5 gemini-2.5-flash for real-time voice conversations via Pipecat's Gemini integration
- Gemini 2.5 gemini-2.5-flash-image for intelligent photo analysis, generating empathetic responses about user memories
- Custom prompts designed for emotional understanding and memory exploration
Pipecat Integration:
- Real-time WebRTC voice communication through Daily.co transport
- Custom pipeline handling photo uploads and analysis results
- Voice UI Kit components for polished user experience
- AWS S3, Lambda: Secure photo storage with presigned URL uploads, video generation
- Daily.co: WebRTC infrastructure for real-time communication
Gemini:
- Excellent: Natural conversation flow with minimal latency
- Great: Easy integration with Pipecat's existing infrastructure
- Suggestion: More examples of custom prompt engineering for specific use cases.
Roleis different from other models (Gemini only has "Model", "User") which was a bit confusing.
Pipecat:
- Loved: Voice UI Kit components saved significant development time!
- Challenge: While Pipecat is super flexible and has many ways to achieve something, I often was not sure what's the best way. For instance, when handling a frame, there is
push_frameandqueue_frame. I wasn't quite sure what's the difference between these two functions and I wish there were a good example that illustrates the behavior.
client/: Next.js app with Voice UI Kit components and resizable layoutserver/: Python bot integrating Gemini with Pipecat