An AI assistant that can read emotions from video and respond with voice using advanced speech processing.
- Real-time Emotion Detection: Uses computer vision to detect emotions from video feed
- Voice Processing: Complete audio pipeline with speech-to-text, AI response generation, and text-to-speech
- **Str
eaming Responses**: Real-time AI responses that adapt to detected emotions
- Push-to-Talk Interface: Hold button to record, release to process
- Visual Feedback: Live emotion display and conversation stream
The application consists of two integrated servers:
- Flask Frontend (Port 5001): Handles video processing, emotion detection, and UI
- Node.js Backend (Port 3001): Processes audio with STT, LLM, and TTS services
- Audio Recording → WebM audio capture
- Speech-to-Text → Deepgram STT API
- AI Response → Inflection AI streaming responses
- Text-to-Speech → Resemble AI voice synthesis
- Audio Playback → Real-time audio streaming
- Python 3.8+
- Node.js 16+
- npm or yarn
You'll need API keys for:
- Deepgram: Speech-to-text processing
- Inflection AI: LLM responses
- Resemble AI: Text-to-speech synthesis
-
Clone the repository
git clone <repository-url> cd self-aware-real
-
Install Python dependencies
pip install flask flask-socketio requests opencv-python numpy
-
Install Node.js dependencies
cd backend npm install cd ..
-
Configure API Keys
# Copy the example environment file cp backend/env.example backend/.env # Edit backend/.env with your API keys: # DEEPGRAM_API_KEY=your_key_here # INFLECTION_API_KEY=your_key_here # RESEMBLE_API_KEY=your_key_here # RESEMBLE_PROJECT_UUID=your_uuid_here # RESEMBLE_VOICE_UUID=your_voice_uuid_here
./start.shThis will start both servers automatically:
- Frontend: http://localhost:5001
- Backend: http://localhost:3001
If you prefer to run servers separately:
-
Start the Node.js backend:
cd backend npm run dev -
Start the Flask frontend (in another terminal):
python3 selfaware.py
- Open your browser to http://localhost:5001
- Click "Start" to activate the system
- Allow camera and microphone permissions
- Hold the button to record your voice
- Release the button to process and get AI response
- Watch the emotion detection update in real-time
- Real-time facial emotion recognition
- Visual emotion indicator with color coding
- Emotion data forwarded to AI for context-aware responses
- Push-to-talk recording interface
- High-quality speech-to-text transcription
- Streaming AI responses with emotion awareness
- Natural voice synthesis and playback
- Minimal UI Changes: Existing interface preserved
- Enhanced Functionality: Full voice processing pipeline
- Real-time Processing: Streaming responses and audio
- Emotion Context: AI responses adapt to detected emotions
GET /: Main application interfacePOST /api/audio/process: Proxy to Node.js backend for audio processing
POST /api/audio/process: Process uploaded audio filesGET /health: Backend health checkWebSocket /: Real-time communication for streaming responses
# Required API Keys
DEEPGRAM_API_KEY=your_deepgram_key
INFLECTION_API_KEY=your_inflection_key
RESEMBLE_API_KEY=your_resemble_key
RESEMBLE_PROJECT_UUID=your_project_uuid
RESEMBLE_VOICE_UUID=your_voice_uuid
# Server Configuration
PORT=3001
CORS_ORIGIN=http://localhost:5001
# Optional Deepgram Settings
DEEPGRAM_MODEL=nova-2
DEEPGRAM_LANGUAGE=en-US- Backend connection failed: Ensure Node.js backend is running on port 3001
- Audio not working: Check microphone permissions and API keys
- Emotion detection not working: Ensure camera permissions and emotion server is running
- API errors: Verify all API keys are correctly set in backend/.env
- Backend logs: Check terminal output from Node.js server
- Frontend logs: Check browser console for client-side errors
- Audio processing logs: Located in
backend/logs/directory
self-aware-real/
├── selfaware.py # Flask frontend server
├── templates/
│ └── index.html # Main UI template
├── backend/ # Node.js backend
│ ├── src/
│ │ ├── index.ts # Backend server entry
│ │ ├── websocket.ts # WebSocket handling
│ │ └── services/ # Audio processing services
│ └── package.json
├── start.sh # Startup script
└── README.md
The integration is designed to be minimal and extensible:
- Add new API endpoints in Flask for additional features
- Extend Node.js backend for new audio processing capabilities
- Modify HTML template for UI enhancements
- Use WebSocket communication for real-time features
[Add your license information here]