A FastAPI wrapper for NeuTTS-Air, providing an OpenAI-compatible text-to-speech API.
- OpenAI-Compatible API: Drop-in replacement for OpenAI's TTS API
- CPU-Only Inference: Optimized for CPU deployment
- FastAPI Backend: High-performance async API server
- Docker Support: Easy containerized deployment
- Voice Cloning: Support for custom voice models
-
Clone the repository:
git clone https://github.com/hasithdd/NeuTTS-FastAPI.git cd NeuTTS-FastAPI -
Build and run with Docker Compose:
cd docker/cpu docker-compose up --build -
Test the API:
curl -X POST "http://localhost:8000/v1/audio/speech" \ -H "Content-Type: application/json" \ -d '{ "input": "Hello, world!", "voice": "dave" }' \ --output speech.wav
-
Install dependencies:
pip install uv uv sync
-
Run the server:
uv run uvicorn api.src.main:app --host 0.0.0.0 --port 8000
Generate speech from text using the specified voice.
Request Body:
{
"input": "Text to convert to speech",
"voice": "dave"
}Supported Voices:
dave: Male voicejo: Female voice
Response:
- Content-Type:
audio/wav - Body: WAV audio data
Example:
curl -X POST "http://localhost:8000/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{"input": "Hello, world!", "voice": "dave"}' \
--output hello.wavNeuTTS-FastAPI/
├── api/
│ └── src/
│ ├── main.py # FastAPI application
│ ├── routers/
│ │ └── openai_compatible.py # TTS endpoint
│ └── voices/ # Voice model files
│ ├── dave.pt
│ └── jo.pt
├── neutts-air/ # NeuTTS-Air library
├── docker/
│ └── cpu/ # Docker configuration
├── pyproject.toml # Project dependencies
└── README.md
- Python 3.10+
- NeuTTS-Air
- FastAPI
- PyTorch (CPU)
- SoundFile
- And other dependencies listed in
pyproject.toml
uv run pytestuv run black .
uv run isort .- GPU Support: Add CUDA acceleration for faster inference on GPU hardware
- Voice Cloning: Enable users to create custom voices from audio samples
- Streaming Audio: Implement real-time audio streaming for low-latency TTS
- Multiple Audio Formats: Support MP3, FLAC, and other popular audio formats
- Batch Processing: Allow processing multiple text inputs in a single request
- Web Interface: Create a user-friendly web UI for testing and demonstration
- Voice Mixing: Combine multiple voices or adjust voice characteristics
- Emotion Control: Add parameters for controlling speech emotion and style
- Multilingual Support: Extend beyond English to support other languages
- API Rate Limiting: Implement request throttling and usage controls
- Performance Optimization: Benchmark and optimize inference speed
- Comprehensive Testing: Add unit tests, integration tests, and CI/CD pipeline
- API Documentation: Generate interactive OpenAPI/Swagger documentation
- Container Optimization: Reduce Docker image size and improve startup time
- Monitoring & Metrics: Add logging, metrics, and health monitoring
See Contributing.md for guidelines.
MIT License - see LICENSE file for details.
- Based on NeuTTS-Air
- Inspired by Kokoro-FastAPI