🎙️ Audio Scribe

A full-stack web application that transcribes audio files and corrects the grammar of the resulting text using AI.

Audio Scribe demo screens

Home Page

Transcription Flow

Grammar Correction

Tech Stack

Frontend: React, Vite, Tailwind CSS
Backend: FastAPI (Python)
AI:
- Transcription: openai-whisper (local model)
- Grammar Correction: Google Gemini API
Infrastructure: Docker, Docker Compose, Nginx

Features

Seamless Audio Upload: Drag-and-drop or browse for audio files (MP3, WAV, M4A, etc.).
Accurate Transcription: Utilizes OpenAI's Whisper model for fast and accurate speech-to-text conversion.
AI-Powered Grammar Correction: One-click grammar and fluency correction using the Gemini API.
Dark Mode: Sleek, user-friendly interface with a dark mode toggle.
Containerized: Fully containerized with Docker for consistent, cross-platform deployment.
Automated CI Pipeline: GitHub Actions for automated testing, linting, and build verification.

Architecture

The application is composed of two main services orchestrated by Docker Compose: a frontend container and a backend container.

graph TB
    %% ================= USER =================
    User["User"]

    %% ================= FRONTEND =================
    subgraph Frontend["Frontend"]
        direction TB
        Nginx["Nginx Reverse Proxy"]
        ReactApp["React UI (Vite + Tailwind)"]
        Nginx -.-> ReactApp
    end

    %% ================= BACKEND =================
    subgraph Backend["Backend"]
        direction TB
        FastAPI["FastAPI Application"]
        
        subgraph Services["Services Layer"]
            direction LR
            TranscriptionService["Transcription Service"]
            GrammarService["Grammar Fixer Service"]
        end
        
        WhisperModel["Whisper Model<br/><small>Local inference (in container)</small>"]
    end

    %% ================= EXTERNAL =================
    subgraph External["External Services"]
        GeminiAPI["Gemini API<br/><small>External HTTP API</small>"]
    end

    %% =============== FLOWS =================
    linkStyle default stroke:#000,stroke-width:2.5px,fill:none




    User -->|" <span style='background:#000;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>1. HTTP Request</span> "| Nginx
    Nginx -->|" <span style='background:#333;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>2. Serve Static</span> "| ReactApp
    ReactApp -->|" <span style='background:#555;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>3. API Calls</span> "| Nginx
    Nginx -->|" <span style='background:#777;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>4. Proxy</span> "| FastAPI

    FastAPI -->|" <span style='background:#000;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>5a. /transcribe</span> "| TranscriptionService
    TranscriptionService -->|" <span style='background:#222;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>6. Process</span> "| WhisperModel
    WhisperModel -.->|" <span style='background:#444;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>7. Text</span> "| TranscriptionService

    FastAPI -->|" <span style='background:#000;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>5b. /fix-grammar</span> "| GrammarService
    GrammarService -->|" <span style='background:#222;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>8. HTTP</span> "| GeminiAPI
    GeminiAPI -.->|" <span style='background:#444;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>9. Corrected</span> "| GrammarService

    TranscriptionService -->|" <span style='background:#000;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>Response</span> "| FastAPI
    GrammarService -->|" <span style='background:#000;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>Response</span> "| FastAPI
    FastAPI -->|" <span style='background:#000;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>Response</span> "| Nginx
    Nginx -->|" <span style='background:#000;color:#fff;padding:4px 8px;border-radius:12px;font-size:14px;'>Response</span> "| User

    %% =============== STYLES =================
    classDef userStyle fill:#ffffff,stroke:#000,stroke-width:2px,color:#000,rx:10,ry:10
    classDef nodeStyle fill:#fdfdfd,stroke:#000,stroke-width:1.5px,color:#000,rx:8,ry:8
    classDef modelStyle fill:#f2f2f2,stroke:#000,stroke-width:1.5px,stroke-dasharray: 4 3,color:#000,rx:8,ry:8
    classDef containerStyle fill:#e9ecef,stroke:#000,stroke-width:1px,color:#000,font-weight:bold,rx:12,ry:12


    class User userStyle
    class Nginx,ReactApp,FastAPI,TranscriptionService,GrammarService nodeStyle
    class WhisperModel,GeminiAPI modelStyle
    class Frontend,Backend,Services,External containerStyle

CI Status

Getting Started

Prerequisites

Docker & Docker Compose
A Google Gemini API Key — used for grammar correction. You can get one from Google AI Studio.
An OpenAI API Key — (optional but not necessary) if you want to use OpenAI Whisper API for transcription.

Installation

Clone the repository:

git clone https://github.com/romanvlad95/audio-scribe
cd audio-scribe

Create the environment file: Create a file named .env in the project root and add your Gemini API key. The transcription uses a local model and does not require a key.
```
# .env
GEMINI_API_KEY="your-gemini-api-key"
```
Build and run with Docker Compose:
```
docker-compose up --build
```
The application will become available at http://localhost:3000.

Usage

Open your web browser and navigate to http://localhost:3000.
Drag and drop an audio file or use the "Browse Files" button to select one.
Click "Transcribe Audio".
Once transcription is complete, you can click "Fix Grammar" to get a corrected version of the text.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
backend		backend
docs/screenshots		docs/screenshots
frontend		frontend
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ Audio Scribe

Audio Scribe demo screens

Home Page

Transcription Flow

Grammar Correction

Tech Stack

Features

Architecture

CI Status

Getting Started

Prerequisites

Installation

Usage

License

About

Uh oh!

Releases

Packages

Languages

romanvlad95/audio-scribe

Folders and files

Latest commit

History

Repository files navigation

🎙️ Audio Scribe

Audio Scribe demo screens

Home Page

Transcription Flow

Grammar Correction

Tech Stack

Features

Architecture

CI Status

Getting Started

Prerequisites

Installation

Usage

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages