A real-time American Sign Language (ASL) learning platform combining rhythm-based gaming with AI-powered hand sign recognition.
SignHero is a full-stack application that teaches ASL fingerspelling through interactive gameplay. The system uses a webcam to detect hand signs in real-time via a trained machine learning model, then challenges players to sign along to beatmaps synced with musicโlike Guitar Hero, but with sign language.
| Component | Description |
|---|---|
Frontend Game (asl/) |
Next.js 15 web app with rhythm game modes, visual effects, and real-time scoring |
ML Backend (Base test/) |
PyTorch model (MobileNetV2) trained on ASL alphabet data |
API Server (api_server_http.py) |
FastAPI server for real-time sign prediction via webcam frames |
- Song Game โ Rhythm-based gameplay with scrolling note highway and combo scoring
- Training Mode โ Step-by-step practice with visual hand pose hints
- Testing Mode โ Timed challenges with accuracy tracking
- Whack-A-Sign โ Arcade-style reflex game
- Real-time webcam analysis using MediaPipe hand tracking
- MobileNetV2 CNN for letter classification (A-Z)
- ~30-50ms inference latency on localhost
- Synthwave aesthetic with neon effects
- Particle bursts, screen flash, streak glow
- Sound effects for hits/misses
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ User's Browser โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Next.js Game (asl/) โโ
โ โ โข GameCanvas โข NoteHighway โข WebcamFeed โโ
โ โ โข useGameLoop โข useSignDetection โข useWebcam โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP POST /predict_frame
โ (JPEG + timestamp)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI Server (api_server_http.py) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ASLPredictor โ โ
โ โ 1. Decode JPEG (cv2) โ โ
โ โ 2. Hand Detection (MediaPipe) โ โ
โ โ 3. Feature Extraction (landmark mask) โ โ
โ โ 4. CNN Inference (PyTorch MobileNetV2) โ โ
โ โ 5. Return {letter, confidence, handDetected} โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
See ARCHITECTURE.md for detailed system diagrams.
ASL-Fun-Training/
โ
โโโ asl/ # ๐ฎ Next.js Frontend Application
โ โโโ src/
โ โ โโโ app/ # App Router pages (game, training, testing)
โ โ โโโ components/game/ # GameCanvas, NoteHighway, effects
โ โ โโโ hooks/ # useGameLoop, useSignDetection, useWebcam
โ โ โโโ lib/ # Scoring, beatmaps, utilities
โ โโโ README.md # Frontend-specific docs
โ
โโโ Base test/Sign-Language-Recognition/ # ๐ง ML Training & Model
โ โโโ app/ # API & frame extraction scripts
โ โโโ model/ # CNN architecture (MobileNetV2)
โ โโโ train/ # Training scripts
โ โโโ data/weights/ # Trained model weights (.pth)
โ โโโ utils/ # Label mapping, model loading
โ
โโโ api_server_http.py # ๐ FastAPI prediction server
โโโ api_server_mock.py # Mock server for testing
โโโ start-servers.sh # One-click startup (Unix)
โโโ start-servers.bat # One-click startup (Windows)
โ
โโโ models/ # Additional model storage
โโโ training/ # Training data/scripts
โโโ dataset/ # Raw dataset
โโโ asset_generation/ # Hand sign SVG assets
โ
โโโ ARCHITECTURE.md # System architecture diagrams
โโโ INTEGRATION_GUIDE.md # Full integration documentation
โโโ QUICKSTART_INTEGRATION.md # Quick setup guide
โโโ SETUP.md # Environment setup
- Python 3.10+ with pip
- Node.js 20+ with pnpm
- MongoDB instance
- Webcam for sign detection
# Navigate to project root
cd ASL-Fun-Training
# Create Python virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install torch torchvision opencv-python mediapipe fastapi uvicorn python-multipart
# Start the API server
python api_server_http.py
# Server runs at http://localhost:8000# In a new terminal
cd asl
# Install dependencies
pnpm install
# Set up environment
cp .env.example .env
# Edit .env with MongoDB URI
# Start dev server
pnpm dev
# App runs at http://localhost:3000# Unix/Mac
./start-servers.sh
# Windows
start-servers.bat| Tech | Purpose |
|---|---|
| Next.js 15 | React framework (App Router, Turbopack) |
| TypeScript | Type safety |
| Tailwind CSS 4 | Styling |
| Framer Motion | Animations |
| tRPC | Type-safe API layer |
| Prisma + MongoDB | Database |
| Tech | Purpose |
|---|---|
| FastAPI | HTTP API server |
| PyTorch | Deep learning framework |
| MobileNetV2 | CNN architecture for classification |
| MediaPipe | Hand landmark detection |
| OpenCV | Image processing |
The sign detection model is a MobileNetV2 trained on hand landmark features:
Input: 224ร224 RGB image (hand feature mask)
โ
MobileNetV2 CNN
โ
Output: 26 classes (A-Z)
Training Pipeline:
- Webcam captures hand images
- MediaPipe extracts 21 hand landmarks
- Landmarks drawn as feature mask on black background
- Both original and mirrored images processed
- Max confidence from both used for prediction
Model weights: Base test/Sign-Language-Recognition/data/weights/asl_crop_v4_1_mobilenet_weights.pth
| Document | Description |
|---|---|
| ARCHITECTURE.md | System diagrams, data flow, timing model |
| INTEGRATION_GUIDE.md | Full integration documentation |
| QUICKSTART_INTEGRATION.md | 5-minute setup guide |
| SETUP.md | Environment configuration |
| asl/README.md | Frontend-specific documentation |
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open a Pull Request
Educational project for ASL learning.
