Textifying Speaking

Description

Textifying Speaking is a local full-stack web application that automates the transcription of audio and video files, offering the option to generate summaries of the obtained transcriptions. This tool is ideal for students, professionals, and anyone who needs to convert multimedia content into text and obtain summaries from them.

Features

🔐 User Authentication

User Registration (US-01): Secure user registration with email and password
- Username validation (3-30 characters)
- Email format validation
- Password strength enforcement (minimum 8 characters)
- Secure password hashing using bcrypt
- Duplicate email/username detection
- Success modal with navigation options
User Login (US-02): Secure JWT-based authentication
- Email and password validation
- Bcrypt password verification
- JWT token generation (1-hour expiration)
- Client-side token storage
- Invalid credentials handling
- Success notification with automatic redirect

📁 File Management

Media File Upload (US-03): Upload audio/video files for transcription
- Supported formats: MP3, WAV, MP4, M4A
- Maximum file size: 100MB
- Drag-and-drop interface
- Real-time upload progress tracking
- Client-side file validation
- Server-side file type and size validation
- JWT-protected endpoint (authentication required)
- Multipart form data support with Axios
Dashboard & File Management (US-04): View and manage uploaded files
- List all user's uploaded files in a responsive grid layout
- File cards display: filename, type, size, upload date, status
- Color-coded status badges (uploaded, processing, completed, failed)
- File type icons for audio/video files
- View detailed file information in modal
- Delete files with confirmation prompt
- Ownership validation (users can only delete their own files)
- Dynamic UI updates after deletion
- Empty state with upload CTA
- JWT-protected endpoints (authentication required)
Real-Time File Status Updates (US-05): Monitor file processing in real-time
- WebSocket-based real-time status updates using Socket.IO
- Status indicators: uploading, ready, processing, completed, error
- Progress tracking (0-100%) for files being processed
- Connection status indicator in dashboard
- Automatic UI updates without page refresh
- Status badges with animated icons:
  - Uploading: Purple with spinner
  - Ready: Green
  - Processing: Yellow with spinning cog + progress percentage
  - Completed: Blue with checkmark
  - Error: Red with alert icon
- Progress bar visualization in file details modal
- Error message display for failed operations
- Toast notifications for completed files and errors
- JWT-secured WebSocket connections
- User-scoped updates (only see your own file updates)
- Status update API endpoint for testing/integration
Audio/Video Transcription (US-06): Transcribe media files to text
- One-click transcription initiation from dashboard
- "Transcribe" button for files in 'ready' status
- Async transcription processing (non-blocking)
- HuggingFace Whisper (medium) model for speech recognition
- Python-based transcription service (Flask + Transformers)
- Real-time status updates via WebSocket
- Status progression: ready → processing → completed/error
- Transcribed text stored in database
- View transcribed text in modal with copy-to-clipboard
- File ownership validation (users can only transcribe their own files)
- File type validation (audio/video only)
- Duplicate transcription prevention
- Error handling with descriptive messages
- Support for multiple audio formats (MP3, WAV, M4A, MP4)
- GPU acceleration support when available
- Containerized transcription service with Docker
- JWT-protected transcription endpoint
Background Real-time Transcription Progress (US-07): Process transcriptions asynchronously with real-time feedback
- BullMQ job queue for background transcription processing
- Redis as message broker for distributed job queue
- Non-blocking transcription (returns immediately after job enqueue)
- Background worker processes jobs with configurable concurrency (2 jobs simultaneously)
- Automatic retry mechanism (up to 3 attempts with exponential backoff)
- Real-time progress updates every 5% (5%, 10%, 15%, ..., 90%, 95%, 100%)
- WebSocket broadcasts of progress to authenticated users
- Google Drive-style floating progress indicator in UI:
  - Fixed bottom-right corner position
  - Shows all files currently processing
  - Real-time progress bars with percentage
  - Auto-hides when no files processing
- Enhanced toast notifications with rich content:
  - Transcription started (info with spinner icon)
  - Transcription completed (success with checkmark)
  - Transcription failed (error with details)
- User can navigate freely while transcription runs in background
- Failed jobs retained for debugging and monitoring
- Dashboard updates automatically without page refresh
- Scalable architecture for handling multiple concurrent transcriptions
View Transcription (US-08): Access and review transcribed text
- Dedicated GET /media/:id/transcription endpoint for secure access
- View transcribed text in responsive modal dialog
- Copy-to-clipboard functionality for easy text extraction
- Displays file metadata (filename, file ID) with transcription
- Status-aware responses:
  - Completed: Returns full transcription text
  - Processing: Shows progress and "in progress" message
  - Error: Displays error message from failed transcription
  - Ready/Uploading: Indicates transcription hasn't started
- Ownership validation (users can only view their own transcriptions)
- Real-time UI updates when transcription completes via WebSocket
- Scrollable container for long transcriptions
- Whitespace-preserved text formatting
- Error handling for missing or incomplete transcriptions
- JWT-protected endpoint ensuring data privacy
- Integrates with existing Dashboard file cards
- "View Text" button appears for completed files with transcription
Summarize Transcription (US-09): Generate AI-powered summaries of completed transcriptions
- Dedicated POST /media/:id/summarize endpoint with JWT authentication
- Uses mT5_multilingual_XLSum model for multilingual abstractive text summarization
- Supports 45+ languages for comprehensive international coverage
- Asynchronous processing via BullMQ job queue (non-blocking API responses)
- Real-time WebSocket updates for summarization progress and completion
- Dashboard features:
  - "Summarize" button for completed files with transcriptions
  - "View Summary" button displays after successful summarization
  - Summary modal shows both summary and original transcription
  - Copy-to-clipboard for both summary and transcription
  - Color-coded UI (purple theme) to distinguish from transcription
- Validation and error handling:
  - Only files with completed transcriptions can be summarized
  - File ownership validation (403 if unauthorized)
  - Prevents duplicate summarization when already processing
  - Descriptive error messages for failures
- Progress tracking:
  - Files tracked with summaryStatus field (pending, processing, completed, error)
  - Real-time toast notifications (start, completion, error)
  - FloatingProgressIndicator shows summarization progress
  - Visual indicators with animated icons
- Background processing:
  - Automatic chunking for texts longer than 512 words
  - Configurable summary length (30-150 tokens)
  - Retry mechanism (up to 3 attempts with exponential backoff)
  - Failed jobs retained for debugging
- Data persistence:
  - Summary stored in MongoDB (summaryText field)
  - Error messages captured (summaryErrorMessage field)
  - Summary status tracked separately from transcription status
- Independent service architecture:
  - Python Flask service on port 5001
  - GPU acceleration support via CUDA
  - Scalable and isolated from transcription service
- WebSocket events for real-time updates:
  - summaryStatusUpdate broadcasts to user when status changes
  - Automatic UI synchronization without page refresh
Background Real-time Summary Status (US-10): Process summarizations asynchronously with real-time feedback
- BullMQ job queue for background summarization processing
- Redis as message broker for distributed job queue
- Non-blocking summarization (returns immediately after job enqueue)
- Background worker processes jobs with configurable concurrency (2 jobs simultaneously)
- Automatic retry mechanism (up to 3 attempts with exponential backoff)
- Real-time status updates via WebSocket broadcasts to authenticated users
- Google Drive-style floating progress indicator in UI:
  - Shows all files currently processing (transcription and summarization)
  - Purple gradient theme for summarization progress
  - Real-time progress updates
  - Auto-hides when no files processing
  - Multiple files displayed with scroll support
- Enhanced toast notifications with rich content:
  - Summarization started (info with spinner icon)
  - Summarization completed (success with checkmark)
  - Summarization failed (error with details and error message)
- User can navigate freely while summarization runs in background
- Failed jobs retained for debugging and monitoring
- Dashboard updates automatically without page refresh
- Duplicate job prevention (cannot start multiple summarizations for same file)
- Scalable architecture for handling multiple concurrent summarizations
- Independent from transcription processing (parallel processing support)
- Status management via summaryStatus field (pending, processing, completed, error)
- Error messages captured and displayed to users
- Complete integration with existing real-time update infrastructure
View Summary Alongside Transcription (US-11): Compare and review transcription with summary side-by-side
- Enhanced modal dialog with side-by-side layout for transcription and summary
- GET /media/:id/transcription endpoint returns both transcription and summary data
- Side-by-side comparison view:
  - Left panel: Full transcription with indigo theme
  - Right panel: AI-generated summary with purple theme
  - Independent scrolling for each panel
  - Copy-to-clipboard buttons for both transcription and summary
- Status-aware summary display:
  - Completed: Shows generated summary with copy functionality
  - Processing: Animated spinner with "Summary in progress..." message
  - Pending: Clock icon with suggestion to click "Summarize" button
  - Error: Alert icon with error message details
  - Not started: Placeholder with instructions to generate summary
- Real-time updates when summary becomes ready:
  - WebSocket integration automatically updates modal
  - No page refresh required
  - Toast notifications on status changes
  - Smooth UI transitions between states
- Enhanced API response format:
```
{
  "status": "completed",
  "transcription": "Full transcribed text...",
  "summaryText": "AI-generated summary...",
  "summaryStatus": "completed",
  "fileId": "...",
  "originalFilename": "..."
}
```
- Responsive design:
  - Desktop: Two-column side-by-side layout
  - Tablet/Mobile: Single-column stacked layout (future enhancement)
  - Maximum viewport height (90vh) with scrollable content
- File ownership validation (403 if unauthorized)
- Integrated with existing Dashboard:
  - "View Text" button opens new side-by-side modal
  - Summary modal deprecated in favor of unified view
  - Consistent styling with existing UI components
- Enhanced user experience:
  - Clear visual distinction between transcription and summary
  - Status indicators prevent confusion about summary availability
  - Automatic updates keep displayed content synchronized
  - File metadata displayed in modal header
- Supports all transcription/summary states:
  - File not yet transcribed: Shows appropriate message
  - Transcription in progress: Displays progress information
  - Transcription completed, summary not started: Shows transcription with prompt to summarize
  - Both completed: Full side-by-side comparison view
- Performance optimized:
  - Lazy loading of summary text
  - Efficient WebSocket message handling
  - Minimal re-renders on status updates

Tech Stack

Backend

Framework: NestJS 10.x (TypeScript)
Database: MongoDB with Mongoose ODM
Cache/Queue: Redis 7.0 for BullMQ job queue
Job Queue: BullMQ with @nestjs/bullmq for background job processing
Authentication: bcrypt for password hashing, JWT for session management
File Upload: Multer for multipart form data handling
Real-Time Communication: Socket.IO with @nestjs/websockets & @nestjs/platform-socket.io
HTTP Client: axios for service-to-service communication
Validation: class-validator & class-transformer
Configuration: @nestjs/config for environment variables
Testing: Jest for unit and E2E tests

Transcription Service

Language: Python 3.11
Framework: Flask 3.0
AI Model: OpenAI Whisper (medium) via HuggingFace Transformers
ML Libraries: PyTorch, Transformers, Accelerate
Audio Processing: FFmpeg
Container: Python 3.11-slim Docker image
Optimization: Lazy loading (model loads on first request)

Summarization Service

Language: Python 3.11
Framework: Flask 3.0
AI Model: mT5_multilingual_XLSum via HuggingFace Transformers (45+ languages)
ML Libraries: PyTorch, Transformers, Accelerate, SentencePiece
Container: Python 3.11-slim Docker image
Optimization: Lazy loading (model loads on first request)

GPU Memory Optimization

Both AI services use lazy loading to optimize GPU memory usage:

Models load on-demand (on first request) rather than at service startup
Services start in seconds without loading heavy AI models
Both services can coexist on same GPU without memory conflicts
Health endpoint includes model_loaded field to verify model status
Implementation: get_pipeline() in transcription, get_summarizer() in summarization

Frontend

Framework: React 19 + Vite 7
Routing: React Router DOM 7
Styling: TailwindCSS 4
HTTP Client: Axios for API requests & file uploads
Real-Time Communication: Socket.IO client (socket.io-client)
Form Validation: Yup for schema-based validation
State Management: Zustand with persist middleware
Notifications: React Toastify
Icons: Iconify React

DevOps

Containerization: Docker & Docker Compose
Services: Frontend (port 5173), Backend (port 3001), Transcription (port 5000), MongoDB (port 27017)
Build Tool: Makefile for common tasks

Quick Start

Prerequisites

Docker and Docker Compose V2
Make (optional, for using Makefile commands)

Using Docker (Recommended)

# Clone the repository
git clone <repository-url>
cd textifying-speaking

# Build and start all services
make quickstart
# OR
docker compose up --build -d

# Access the application
# Frontend: http://localhost:5173
# Backend: http://localhost:3001
# Transcription: http://localhost:5000
# MongoDB: mongodb://localhost:27017
# Redis: redis://localhost:6379

Using Makefile

# View all available commands
make help

# Build containers
make build

# Start services
make up

# Stop services
make down

# View logs
make logs
make logs-backend
make logs-frontend
make logs-transcription
make logs-redis

# Run tests
make test-backend

# Access container shells
make shell-backend
make shell-frontend
make shell-transcription
make shell-redis
make redis-cli
make db-shell

# Clean up (removes volumes)
make clean

API Documentation

Authentication Endpoints

POST `/auth/register`

Register a new user account.

Request Body:

{
  "username": "johndoe",
  "email": "john@example.com",
  "password": "securePassword123"
}

Success Response (201):

{
  "message": "User registered successfully",
  "user": {
    "id": "507f1f77bcf86cd799439011",
    "username": "johndoe",
    "email": "john@example.com",
    "createdAt": "2025-11-19T20:00:00.000Z"
  }
}

Error Responses:

400 Bad Request: Validation errors (invalid email, weak password, etc.)
409 Conflict: Email or username already exists

Validation Rules:

username: Required, 3-30 characters
email: Required, valid email format
password: Required, minimum 8 characters

POST `/auth/login`

Authenticate a user and receive a JWT token.

Request Body:

{
  "email": "john@example.com",
  "password": "securePassword123"
}

Success Response (200):

{
  "message": "Login successful",
  "accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "507f1f77bcf86cd799439011",
    "username": "johndoe",
    "email": "john@example.com"
  }
}

Error Responses:

400 Bad Request: Validation errors (missing fields, invalid email format)
401 Unauthorized: Invalid credentials (wrong email or password)

Validation Rules:

email: Required, valid email format
password: Required

JWT Token:

Expiration: 1 hour
Payload includes: user ID (sub), email, username
Store in localStorage on client-side for subsequent authenticated requests

Media Endpoints

POST `/media/upload`

Upload an audio or video file for transcription.

Authentication: Required (Bearer JWT token)

Request:

Content-Type: multipart/form-data
Body: Form data with file field

Example (curl):

curl -X POST http://localhost:3001/media/upload \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -F "file=@/path/to/audio.mp3"

Success Response (201):

{
  "message": "File uploaded successfully",
  "file": {
    "id": "507f1f77bcf86cd799439011",
    "filename": "file-1637258400000-123456789.mp3",
    "originalFilename": "audio.mp3",
    "mimetype": "audio/mpeg",
    "size": 2048576,
    "uploadDate": "2025-11-19T20:00:00.000Z",
    "status": "uploaded"
  }
}

Error Responses:

400 Bad Request: No file uploaded, invalid file type, or file too large
401 Unauthorized: Missing or invalid JWT token

Validation Rules:

Allowed MIME types: audio/mpeg, audio/wav, audio/x-wav, video/mp4, audio/mp4, audio/x-m4a
Maximum file size: 100MB
Allowed extensions: .mp3, .wav, .mp4, .m4a

Storage:

Files stored in: MEDIA_STORAGE_PATH (default: ./uploads)
Filename format: file-{timestamp}-{random}.{ext}
Metadata stored in MongoDB

GET `/media`

List all uploaded files for the authenticated user.

Authentication: Required (Bearer JWT token)

Example (curl):

curl -X GET http://localhost:3001/media \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Success Response (200):

{
  "files": [
    {
      "id": "507f1f77bcf86cd799439011",
      "filename": "file-1637258400000-123456789.mp3",
      "originalFilename": "audio.mp3",
      "mimetype": "audio/mpeg",
      "size": 2048576,
      "uploadDate": "2025-11-19T20:00:00.000Z",
      "status": "uploaded"
    }
  ]
}

Error Responses:

401 Unauthorized: Missing or invalid JWT token

Response Fields:

id: Unique file identifier
filename: Stored filename on server
originalFilename: Original filename uploaded by user
mimetype: File MIME type
size: File size in bytes
uploadDate: Upload timestamp (ISO 8601)
status: Processing status (uploaded, processing, completed, failed)

DELETE `/media/:id`

Delete a specific file by ID.

Authentication: Required (Bearer JWT token)

Path Parameters:

id: File ID (MongoDB ObjectId)

Example (curl):

curl -X DELETE http://localhost:3001/media/507f1f77bcf86cd799439011 \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Success Response (200):

{
  "message": "File deleted successfully"
}

Error Responses:

401 Unauthorized: Missing or invalid JWT token
403 Forbidden: User does not own the file
404 Not Found: File not found

Behavior:

Validates file ownership before deletion
Deletes both physical file from storage and database record
If physical file is missing, continues with database deletion (logs error)

PATCH `/media/:id/status`

Update the processing status of a file.

Authentication: Required (Bearer JWT token)

Path Parameters:

id: File ID (MongoDB ObjectId)

Request Body:

{
  "status": "processing",
  "progress": 50,
  "errorMessage": "Optional error message"
}

Example (curl):

curl -X PATCH http://localhost:3001/media/507f1f77bcf86cd799439011/status \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"status":"processing","progress":50}'

Success Response (200):

{
  "message": "File status updated successfully",
  "file": {
    "id": "507f1f77bcf86cd799439011",
    "status": "processing",
    "progress": 50
  }
}

Error Responses:

400 Bad Request: Invalid status value
401 Unauthorized: Missing or invalid JWT token
403 Forbidden: User does not own the file
404 Not Found: File not found

Valid Status Values:

uploading: File is being uploaded
ready: File is ready for processing
processing: File is being transcribed
completed: Transcription completed successfully
error: An error occurred during processing

Behavior:

Validates file ownership before updating
Emits real-time WebSocket update to user
Updates status, progress, and optional error message

POST `/media/:id/transcribe`

Initiate transcription for an audio/video file.

Authentication: Required (Bearer JWT token)

Path Parameters:

id: File ID (MongoDB ObjectId)

Example (curl):

curl -X POST http://localhost:3001/media/507f1f77bcf86cd799439011/transcribe \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Success Response (200):

{
  "message": "Transcription started",
  "file": {
    "id": "507f1f77bcf86cd799439011",
    "status": "processing"
  }
}

Error Responses:

400 Bad Request: File is not an audio/video file
401 Unauthorized: Missing or invalid JWT token
403 Forbidden: User does not own the file
404 Not Found: File not found
500 Internal Server Error: File is already processing or completed, transcription service error

Behavior:

Validates file ownership and type (audio/video only)
Checks file status (rejects if already processing or completed)
Updates status to processing immediately
Sends file to Python transcription service asynchronously
Emits real-time WebSocket updates during processing
On success: stores transcribed text, updates status to completed
On failure: updates status to error with error message
Returns immediately (transcription happens in background)

Supported MIME Types:

audio/mpeg (MP3)
audio/wav (WAV)
audio/mp4 (M4A)
video/mp4 (MP4)
audio/x-m4a (M4A)

Transcription Service:

Uses OpenAI Whisper (small) model via HuggingFace
Supports GPU acceleration when available
Timeout: 5 minutes per file
Automatic chunking for long audio (30-second chunks)

GET `/media/:id/transcription`

Retrieve the transcription text and summary (if available) for a transcribed file.

Authentication: Required (Bearer JWT token)

Path Parameters:

id: File ID (MongoDB ObjectId)

Example (curl):

curl -X GET http://localhost:3001/media/507f1f77bcf86cd799439011/transcription \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Success Response (200) - Completed File with Summary:

{
  "status": "completed",
  "transcription": "This is the transcribed text from the audio file...",
  "fileId": "507f1f77bcf86cd799439011",
  "originalFilename": "audio.mp3",
  "summaryText": "AI-generated summary of the transcription...",
  "summaryStatus": "completed"
}

Success Response (200) - Completed File with Pending Summary:

{
  "status": "completed",
  "transcription": "This is the transcribed text from the audio file...",
  "fileId": "507f1f77bcf86cd799439011",
  "originalFilename": "audio.mp3",
  "summaryStatus": "pending"
}

Success Response (200) - Completed File with Processing Summary:

{
  "status": "completed",
  "transcription": "This is the transcribed text from the audio file...",
  "fileId": "507f1f77bcf86cd799439011",
  "originalFilename": "audio.mp3",
  "summaryStatus": "processing"
}

Success Response (200) - Completed File with Summary Error:

{
  "status": "completed",
  "transcription": "This is the transcribed text from the audio file...",
  "fileId": "507f1f77bcf86cd799439011",
  "originalFilename": "audio.mp3",
  "summaryStatus": "error",
  "summaryErrorMessage": "Summarization service unavailable"
}

Success Response (200) - Processing File:

{
  "status": "processing",
  "progress": 75,
  "message": "Transcription is in progress"
}

Success Response (200) - Error File:

{
  "status": "error",
  "message": "Transcription failed: Service unavailable"
}

Success Response (200) - Ready/Uploading File:

{
  "status": "ready",
  "message": "Transcription has not been started yet"
}

Error Responses:

401 Unauthorized: Missing or invalid JWT token
403 Forbidden: User does not own the file
404 Not Found: File not found
500 Internal Server Error: Completed file has no transcription text (data inconsistency)

Behavior:

Validates file ownership before returning transcription
Returns different responses based on file status:
- completed: Returns full transcription text with file metadata and summary data (if available)
- processing: Returns progress percentage and status message
- error: Returns error message from failed transcription
- ready/uploading: Returns message indicating transcription hasn't started
Summary fields included in response when transcription is completed:
- summaryText: The generated summary (only if summaryStatus is 'completed')
- summaryStatus: Current status of summarization (pending, processing, completed, error, or undefined)
- summaryErrorMessage: Error details if summarization failed
Only the file owner can retrieve transcription and summary (enforces data privacy)

Use Cases:

Display transcribed text and summary side-by-side in UI after completion
Check transcription and summary status without fetching full file details
Implement "View Transcription & Summary" feature in frontend (US-11)
Poll for completion status (though WebSockets are preferred for real-time updates)

Best Practices:

Use WebSocket events for real-time status updates instead of polling this endpoint
This endpoint is ideal for retrieving transcription and summary after page reload or direct navigation
Handle all status responses gracefully in UI (show spinner for processing, error message for errors, etc.)
Display summary status indicators (pending, processing, completed, error) when showing transcription

Project Structure

textifying-speaking/
├── ts-back/                  # NestJS Backend
│   ├── src/
│   │   ├── auth/            # Authentication module
│   │   │   ├── dto/         # Data Transfer Objects
│   │   │   ├── guards/      # JWT AuthGuard
│   │   │   ├── strategies/  # JWT Strategy
│   │   │   ├── auth.controller.ts
│   │   │   ├── auth.service.ts
│   │   │   └── auth.module.ts
│   │   ├── users/           # Users module
│   │   │   ├── schemas/     # MongoDB schemas (User)
│   │   │   ├── users.service.ts
│   │   │   └── users.module.ts
│   │   ├── media/           # Media upload & transcription module
│   │   │   ├── schemas/     # MongoDB schemas (MediaFile)
│   │   │   ├── filters/     # Exception filters
│   │   │   ├── media.controller.ts
│   │   │   ├── media.service.ts
│   │   │   ├── media.gateway.ts  # WebSocket gateway
│   │   │   └── media.module.ts
│   │   ├── app.module.ts    # Main application module
│   │   └── main.ts          # Application entry point
│   ├── test/                # E2E tests
│   ├── uploads/             # Uploaded files storage
│   ├── Dockerfile
│   └── package.json
├── ts-front/                 # React Frontend
│   ├── src/
│   │   ├── components/
│   │   │   └── Navbar.jsx   # Navigation with auth UI & upload/dashboard links
│   │   ├── hooks/
│   │   │   └── useFileStatus.js  # WebSocket hook for real-time updates
│   │   ├── pages/
│   │   │   ├── Register.jsx # Registration page
│   │   │   ├── Login.jsx    # Login page
│   │   │   ├── Upload.jsx   # File upload page
│   │   │   ├── Dashboard.jsx # File management & transcription dashboard
│   │   │   └── HealthCheck.jsx
│   │   ├── store/
│   │   │   └── authStore.js # Zustand auth state
│   │   ├── App.jsx          # Main app component & routes
│   │   └── main.jsx         # Application entry point
│   ├── Dockerfile
│   └── package.json
├── ts-transcription/         # Python Transcription Service
│   ├── app.py               # Flask application with Whisper model
│   ├── requirements.txt     # Python dependencies
│   ├── Dockerfile           # Container with PyTorch & Transformers
│   └── README.md            # Service documentation
├── docker-compose.yml        # Docker services configuration
├── Makefile                  # Development commands
└── README.md                 # This file

Testing

Backend Tests

# Run all unit tests
make test-backend
# OR
docker exec ts-backend npm test

# Run E2E tests
make test-backend-e2e
# OR
docker exec ts-backend npm run test:e2e

# Run tests with coverage
make test-backend-cov
# OR
docker exec ts-backend npm run test:cov

# Run specific test suites
npm test -- media.service.spec.ts           # MediaService unit tests
npm test -- summarization.processor.spec.ts # Summarization unit tests
npm test -- media-summarize.e2e-spec.ts     # Summarization E2E tests

Frontend Validation

# Build frontend (validates JSX/imports)
cd ts-front && npm run build

# Run ESLint
cd ts-front && npm run lint

US-09 Testing

Automated test script validates the complete summarization feature:

# Run comprehensive US-09 tests
./test-us-09.sh

This script validates:

✅ Backend unit tests (MediaService + SummarizationProcessor)
✅ Frontend build (validates JSX/imports)
✅ Frontend linting (ESLint)
✅ Docker Compose configuration (service presence)

Test Results Summary:

Backend: 43/43 tests passing
- MediaService: 24 tests
- SummarizationProcessor: 6 tests
- Other modules: 13 tests
Frontend: Builds successfully with no errors
Frontend: ESLint passes with no warnings

Manual Testing Workflow

To test the complete summarization workflow with services running:

# 1. Start all services
make quickstart

# 2. Register a user
curl -X POST http://localhost:3001/auth/register \
  -H 'Content-Type: application/json' \
  -d '{"username":"testuser","email":"test@example.com","password":"password123"}'

# 3. Login to get token
TOKEN=$(curl -s -X POST http://localhost:3001/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"email":"test@example.com","password":"password123"}' \
  | jq -r '.accessToken')

# 4. Upload a media file
FILE_ID=$(curl -s -X POST http://localhost:3001/media/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@/path/to/audio.mp3" \
  | jq -r '.file._id')

# 5. Start transcription
curl -X POST http://localhost:3001/media/$FILE_ID/transcribe \
  -H "Authorization: Bearer $TOKEN"

# 6. Wait for transcription to complete (check status via WebSocket or polling)

# 7. Start summarization
curl -X POST http://localhost:3001/media/$FILE_ID/summarize \
  -H "Authorization: Bearer $TOKEN"

# 8. Check summary status
curl -X GET http://localhost:3001/media \
  -H "Authorization: Bearer $TOKEN" \
  | jq '.[] | select(._id=="'$FILE_ID'") | {summaryStatus, summaryText}'

Browser Testing

Open http://localhost:5173 in browser
Register/login as user
Upload an audio/video file
Click 'Transcribe' button
Wait for transcription to complete (watch floating progress indicator)
Click 'Summarize' button (purple button appears after transcription)
Wait for summarization to complete (watch floating progress indicator)
Click 'View Summary' button to see modal with summary and transcription
Test copy-to-clipboard buttons in modal
Verify real-time updates work (status badges update automatically)

US-10 Testing

US-10 testing is integrated into US-09 tests since US-10 represents the background real-time aspects already implemented in US-09.

Automated Tests:

# Run backend unit tests (includes SummarizationProcessor tests)
make test-backend

# Run all E2E tests
make test-backend-e2e

Test Coverage:

✅ BullMQ job queue integration (SummarizationProcessor)
✅ Background job processing with concurrency control
✅ Automatic retry mechanism with exponential backoff
✅ WebSocket real-time status broadcasts
✅ Error handling and recovery
✅ Multiple simultaneous job processing
✅ Job completion and failure scenarios

Manual Real-Time Testing:

Start all services: make quickstart
Open browser at http://localhost:5173
Register/login as user
Upload multiple audio/video files
Start transcription on multiple files simultaneously
Observe floating progress indicator showing all processing files
Navigate to different pages - progress continues in background
After transcriptions complete, start summarization on multiple files
Observe:
- Purple gradient progress indicator for summarization
- Real-time toast notifications (started, completed, failed)
- Dashboard status badges update automatically
- User can continue browsing while processing occurs
Verify summarization completes successfully
Click "View Summary" to see generated summary

Test error handling by stopping summarization service:

docker stop ts-summarization
# Try to summarize a file - should show error toast
docker start ts-summarization

WebSocket Event Testing:

# Monitor WebSocket events in browser console:
# 1. Open browser DevTools → Console
# 2. Look for "Summary status update received:" messages
# 3. Verify events contain: fileId, summaryStatus, summaryText, summaryErrorMessage

# Manual WebSocket connection test:
TOKEN=$(curl -s -X POST http://localhost:3001/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"email":"test@example.com","password":"password123"}' \
  | jq -r '.accessToken')

# Connect via Socket.IO client and listen to 'summaryStatusUpdate' events

Background Processing Verification:

# Check Redis for queued jobs
make redis-cli
> KEYS *summarization*
> LLEN bull:summarization:waiting
> LLEN bull:summarization:active
> LLEN bull:summarization:completed
> LLEN bull:summarization:failed
> exit

# Check summarization service logs
make logs-summarization

# Check backend processor logs
make logs-backend | grep SummarizationProcessor

Environment Variables

Backend (`ts-back/.env`)

MONGODB_URI=mongodb://mongodb:27017/textifying-speaking
JWT_SECRET=your-super-secret-jwt-key-change-in-production
PORT=3001
MEDIA_STORAGE_PATH=./uploads
REDIS_HOST=redis
REDIS_PORT=6379
TRANSCRIPTION_SERVICE_URL=http://transcription:5000
SUMMARIZATION_SERVICE_URL=http://summarization:5001

Transcription Service (`ts-transcription/.env`)

PORT=5000

Summarization Service (`ts-summarization/.env`)

PORT=5001

Docker Compose

Environment variables are configured in docker-compose.yml for containerized deployments.

Security Features

✅ Password hashing with bcrypt (salt rounds: 10)
✅ JWT-based authentication with 1-hour token expiration
✅ Input validation (client-side and server-side)
✅ Email uniqueness enforcement
✅ Username uniqueness enforcement
✅ File ownership validation (users can only delete their own files)
✅ Status update ownership validation (users can only update their own files)
✅ Protected routes (authentication required for sensitive operations)
✅ JWT-secured WebSocket connections (authentication required for real-time updates)
✅ User-scoped WebSocket broadcasts (users only receive updates for their own files)
✅ File ownership validation for transcription (users can only transcribe their own files)
✅ File type validation for transcription (audio/video only)
✅ File ownership validation for summarization (users can only summarize their own files)
✅ Transcription completion validation for summarization (prevents summarization of unfinished transcriptions)
✅ CORS enabled for frontend communication
✅ MongoDB connection security
✅ Secure credential verification (constant-time comparison via bcrypt)
🔜 Rate limiting (future enhancement)
🔜 Email verification (future enhancement)
🔜 Refresh tokens (future enhancement)

Development Notes

Frontend uses JavaScript (JSX), not TypeScript
Backend uses TypeScript with strict validation
MongoDB uses Mongoose ODM for schema definition
All passwords are hashed before storage
Docker Compose manages service orchestration
Hot reload enabled for development mode

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github		.github
ts-back		ts-back
ts-front		ts-front
ts-summarization		ts-summarization
ts-transcription		ts-transcription
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TRANSCRIPTION-ANALYSIS.md		TRANSCRIPTION-ANALYSIS.md
docker-compose.yml		docker-compose.yml

License

EdoAbarca/textifying-speaking

Folders and files

Latest commit

History

Repository files navigation

Textifying Speaking

Description

Features

🔐 User Authentication

📁 File Management

Tech Stack

Backend

Transcription Service

Summarization Service

GPU Memory Optimization

Frontend

DevOps

Quick Start

Prerequisites

Using Docker (Recommended)

Using Makefile

API Documentation

Authentication Endpoints

POST /auth/register

POST /auth/login

Media Endpoints

POST /media/upload

GET /media

DELETE /media/:id

PATCH /media/:id/status

POST /media/:id/transcribe

GET /media/:id/transcription

Project Structure

Testing

Backend Tests

Frontend Validation

US-09 Testing

Manual Testing Workflow

Browser Testing

US-10 Testing

Environment Variables

Backend (ts-back/.env)

Transcription Service (ts-transcription/.env)

Summarization Service (ts-summarization/.env)

Docker Compose

Security Features

Development Notes

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

POST `/auth/register`

POST `/auth/login`

POST `/media/upload`

GET `/media`

DELETE `/media/:id`

PATCH `/media/:id/status`

POST `/media/:id/transcribe`

GET `/media/:id/transcription`

Backend (`ts-back/.env`)

Transcription Service (`ts-transcription/.env`)

Summarization Service (`ts-summarization/.env`)

Packages