Self-Aware AI

LINK TO THE DEMO

An AI assistant that can read emotions from video and respond with voice using advanced speech processing.

Features

Real-time Emotion Detection: Uses computer vision to detect emotions from video feed
Voice Processing: Complete audio pipeline with speech-to-text, AI response generation, and text-to-speech
**Str

eaming Responses**: Real-time AI responses that adapt to detected emotions

Push-to-Talk Interface: Hold button to record, release to process
Visual Feedback: Live emotion display and conversation stream

Architecture

The application consists of two integrated servers:

Flask Frontend (Port 5001): Handles video processing, emotion detection, and UI
Node.js Backend (Port 3001): Processes audio with STT, LLM, and TTS services

Audio Processing Pipeline

Audio Recording → WebM audio capture
Speech-to-Text → Deepgram STT API
AI Response → Inflection AI streaming responses
Text-to-Speech → Resemble AI voice synthesis
Audio Playback → Real-time audio streaming

Setup

Prerequisites

Python 3.8+
Node.js 16+
npm or yarn

API Keys Required

You'll need API keys for:

Deepgram: Speech-to-text processing
Inflection AI: LLM responses
Resemble AI: Text-to-speech synthesis

Installation

Clone the repository

git clone <repository-url>
cd self-aware-real

Install Python dependencies

pip install flask flask-socketio requests opencv-python numpy

Install Node.js dependencies
```
cd backend
npm install
cd ..
```

Configure API Keys

# Copy the example environment file
cp backend/env.example backend/.env

# Edit backend/.env with your API keys:
# DEEPGRAM_API_KEY=your_key_here
# INFLECTION_API_KEY=your_key_here  
# RESEMBLE_API_KEY=your_key_here
# RESEMBLE_PROJECT_UUID=your_uuid_here
# RESEMBLE_VOICE_UUID=your_voice_uuid_here

Running the Application

Quick Start

./start.sh

This will start both servers automatically:

Frontend: http://localhost:5001
Backend: http://localhost:3001

Manual Start

If you prefer to run servers separately:

Start the Node.js backend:
```
cd backend
npm run dev
```
Start the Flask frontend (in another terminal):
```
python3 selfaware.py
```

Usage

Open your browser to http://localhost:5001
Click "Start" to activate the system
Allow camera and microphone permissions
Hold the button to record your voice
Release the button to process and get AI response
Watch the emotion detection update in real-time

Features in Detail

Emotion Detection

Real-time facial emotion recognition
Visual emotion indicator with color coding
Emotion data forwarded to AI for context-aware responses

Voice Processing

Push-to-talk recording interface
High-quality speech-to-text transcription
Streaming AI responses with emotion awareness
Natural voice synthesis and playback

Integration Benefits

Minimal UI Changes: Existing interface preserved
Enhanced Functionality: Full voice processing pipeline
Real-time Processing: Streaming responses and audio
Emotion Context: AI responses adapt to detected emotions

API Endpoints

Flask Frontend

GET /: Main application interface
POST /api/audio/process: Proxy to Node.js backend for audio processing

Node.js Backend

POST /api/audio/process: Process uploaded audio files
GET /health: Backend health check
WebSocket /: Real-time communication for streaming responses

Configuration

Environment Variables (backend/.env)

# Required API Keys
DEEPGRAM_API_KEY=your_deepgram_key
INFLECTION_API_KEY=your_inflection_key
RESEMBLE_API_KEY=your_resemble_key
RESEMBLE_PROJECT_UUID=your_project_uuid
RESEMBLE_VOICE_UUID=your_voice_uuid

# Server Configuration
PORT=3001
CORS_ORIGIN=http://localhost:5001

# Optional Deepgram Settings
DEEPGRAM_MODEL=nova-2
DEEPGRAM_LANGUAGE=en-US

Troubleshooting

Common Issues

Backend connection failed: Ensure Node.js backend is running on port 3001
Audio not working: Check microphone permissions and API keys
Emotion detection not working: Ensure camera permissions and emotion server is running
API errors: Verify all API keys are correctly set in backend/.env

Logs

Backend logs: Check terminal output from Node.js server
Frontend logs: Check browser console for client-side errors
Audio processing logs: Located in backend/logs/ directory

Development

Project Structure

self-aware-real/
├── selfaware.py          # Flask frontend server
├── templates/
│   └── index.html        # Main UI template
├── backend/              # Node.js backend
│   ├── src/
│   │   ├── index.ts      # Backend server entry
│   │   ├── websocket.ts  # WebSocket handling
│   │   └── services/     # Audio processing services
│   └── package.json
├── start.sh              # Startup script
└── README.md

Adding Features

The integration is designed to be minimal and extensible:

Add new API endpoints in Flask for additional features
Extend Node.js backend for new audio processing capabilities
Modify HTML template for UI enhancements
Use WebSocket communication for real-time features

License

[Add your license information here]

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
backend		backend
emotion		emotion
frontend		frontend
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
prompts.py		prompts.py
requirements.txt		requirements.txt
selfaware.py		selfaware.py
tmp		tmp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Self-Aware AI

Features

Architecture

Audio Processing Pipeline

Setup

Prerequisites

API Keys Required

Installation

Running the Application

Quick Start

Manual Start

Usage

Features in Detail

Emotion Detection

Voice Processing

Integration Benefits

API Endpoints

Flask Frontend

Node.js Backend

Configuration

Environment Variables (backend/.env)

Troubleshooting

Common Issues

Logs

Development

Project Structure

Adding Features

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

zetyquickly/self-aware

Folders and files

Latest commit

History

Repository files navigation

Self-Aware AI

Features

Architecture

Audio Processing Pipeline

Setup

Prerequisites

API Keys Required

Installation

Running the Application

Quick Start

Manual Start

Usage

Features in Detail

Emotion Detection

Voice Processing

Integration Benefits

API Endpoints

Flask Frontend

Node.js Backend

Configuration

Environment Variables (backend/.env)

Troubleshooting

Common Issues

Logs

Development

Project Structure

Adding Features

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages