Skip to content

MayaResearch/maya_2

Repository files navigation

Maya 2

A modern AI-powered Text-to-Speech application that generates Hindi and Telugu text using AI and converts it to natural-sounding speech.

🚀 Tech Stack

  • React 18 - Modern UI library
  • TypeScript - Type-safe JavaScript
  • Tailwind CSS - Utility-first CSS framework
  • shadcn/ui - Beautiful, accessible UI components
  • Vite - Lightning fast build tool
  • OpenRouter API - AI text generation
  • Maya AI TTS - Text-to-speech synthesis
  • ESLint - Code quality and consistency

✨ Features

  • 🤖 AI Text Generation - Generate text in Hindi and Telugu using OpenRouter API
  • 🗣️ Text-to-Speech - Convert text to natural speech using Maya AI (two models available)
    • Maya-Te_Hi: Hindi/Telugu voices with automatic transcription
    • Maya-English: Customizable English voices with voice presets
  • 📝 Audio Transcription - Automatic transcription with word-level timestamps using AssemblyAI
  • 🌍 Multi-Language Support - Hindi (हिंदी), Telugu (తెలుగు), and English
  • 🎨 Modern UI - Built with shadcn/ui components and Manrope font
  • 🎵 Audio Playback - Built-in audio player for generated speech
  • Real-time Processing - Fast API integration with streaming support
  • 🎯 Custom Voice Configuration - Define custom voice descriptions and seeds for consistent output
  • 📊 Data Explorer - Browse and explore transcription data from S3 with audio playback

📦 Getting Started

Prerequisites

  • Node.js (v18 or higher)
  • npm or yarn

Installation

  1. Install dependencies:
npm install
  1. Set up environment variables:

Create a .env file in the root directory (or copy .env.example):

cp .env.example .env

Then add your API keys:

VITE_OPENROUTER_API_KEY=your_openrouter_api_key
VITE_MAYA_API_KEY=your_maya_api_key
VITE_MAYA_ENGLISH_API_KEY=your_maya_english_api_key
VITE_ASSEMBLYAI_API_KEY=your_assemblyai_api_key
VITE_AWS_ACCESS_KEY_ID=your_aws_access_key
VITE_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
VITE_S3_BUCKET_NAME=voice-assistant-transcriptions-2025
VITE_AWS_REGION=us-east-1

API Keys Required:

  • VITE_MAYA_API_KEY - For Maya TTS (maya-te_hi model)
  • VITE_MAYA_ENGLISH_API_KEY - For Maya English TTS (maya-english model)
  • VITE_OPENROUTER_API_KEY - For AI text generation
  • VITE_ASSEMBLYAI_API_KEY - For AssemblyAI audio transcription with word-level timestamps (supports Hindi & Telugu)
  • VITE_AWS_ACCESS_KEY_ID & VITE_AWS_SECRET_ACCESS_KEY - For AWS S3 access to fetch transcription data
  • VITE_S3_BUCKET_NAME - S3 bucket name for transcription storage
  • VITE_AWS_REGION - AWS region for S3 bucket
  1. Start the development server:
npm run dev

The app will open at http://localhost:3000

🛠️ Available Scripts

  • npm run dev - Start development server
  • npm run build - Build for production
  • npm run preview - Preview production build
  • npm run lint - Run ESLint

📁 Project Structure

Maya_2/
├── src/
│   ├── components/
│   │   └── ui/          # Reusable UI components (Button, Select, etc.)
│   ├── pages/
│   │   ├── Maya2.tsx    # Main TTS generation page
│   │   └── Data.tsx     # S3 data explorer page
│   ├── lib/
│   │   ├── api.ts       # API utilities for TTS
│   │   ├── s3.ts        # S3 service utilities
│   │   └── utils.ts     # General utilities
│   ├── App.tsx          # Main app with routing
│   ├── main.tsx
│   ├── index.css
│   └── vite-env.d.ts
├── public/
├── index.html
├── package.json
├── tsconfig.json
├── tailwind.config.js
└── vite.config.ts

📊 Data Page

The Data page allows you to browse and explore transcription data stored in AWS S3:

S3 Bucket Structure

s3://voice-assistant-transcriptions-2025/
├── index.json (metadata file)
├── hindi/
│   ├── [filename_with_underscores]/
│   │   ├── audio.mp3 (original audio)
│   │   ├── ns_audio.mp3 (noise-suppressed audio)
│   │   └── transcription.json
│   └── ...
└── telugu/
    ├── [filename_with_underscores]/
    │   ├── audio.mp3 (original audio)
    │   ├── ns_audio.mp3 (noise-suppressed audio)
    │   └── transcription.json
    └── ...

Transcription JSON Format

Each transcription.json file contains:

{
  "filename": "WhatsApp Audio 2025-10-24 at 19.52.25_9abf3fc6.mp3",
  "language": "telugu",
  "transcription": {
    "native_transcription": "Native script transcription...",
    "romanized_transcription": "Romanized version...",
    "english_translation": "English translation...",
    "focal_point": "Summary of the conversation..."
  }
}

Features

  • Filter by Language - View Hindi, Telugu, or all transcriptions
  • Audio Playback - Play original and noise-suppressed audio directly from S3
  • Expandable Details - View native, romanized, and English translations
  • Focal Point Summary - Quick overview of conversation topics

🎨 Styling

This project uses Tailwind CSS for styling. The configuration follows best practices:

  • Utility-first approach - Compose designs with utility classes
  • Responsive design - Mobile-first responsive breakpoints
  • Custom theming - Easily extend the default theme in tailwind.config.js

📝 TypeScript Configuration

The project uses strict TypeScript settings:

  • No implicit any types
  • Strict null checks enabled
  • All strict mode flags enabled
  • Full type safety throughout the codebase

🔧 Development Standards

  • TypeScript Strict Mode: Enabled for maximum type safety
  • ESLint: Configured with React and TypeScript rules
  • Component Structure: Organized by feature and UI components
  • No any types: Explicit typing required throughout

📄 License

MIT

maya_2