ASMR
video-1737110239209.webm
(typo in video, ignore it)
Digital Human
output2_added_subtitle.mp4
Give a star β if you like it!
Kokoro is a trending top 2 TTS model on huggingface.
This repo provides insanely fast Kokoro infer in Rust, you can now have your built TTS engine powered by Kokoro and infer fast by only a command of koko.
kokoros is a rust crate that provides easy to use TTS ability.
One can directly call koko in terminal to synthesize audio.
kokoros uses a relative small model 87M params, while results in extremly good quality voices results.
Languge support:
- English;
- Chinese (partly);
- Japanese (partly);
- German (partly);
π₯π₯π₯π₯π₯π₯π₯π₯π₯ Kokoros Rust version just got a lot attention now. If you also interested in insanely fast inference, embeded build, wasm support etc, please star this repo! We are keep updating it.
New Discord community: https://discord.gg/E566zfDWqD, Please join us if you interested in Rust Kokoro.
2025.01.22: π₯π₯π₯ Streaming mode supported. You can now using--streamto have fun with stream mode, kudos to mroigo;2025.01.17: π₯π₯π₯ Style mixing supported! Now, listen the output AMSR effect by simply specific style:af_sky.4+af_nicole.5;2025.01.15: OpenAI compatible server supported, openai format still under polish!2025.01.15: Phonemizer supported! NowKokoroscan inference E2E without anyother dependencies! Kudos to @tstm;2025.01.13: Espeak-ng tokenizer and phonemizer supported! Kudos to @mindreframer ;2025.01.12: ReleasedKokoros;
- Install required Python packages:
pip install -r scripts/requirements.txt- Initialize voice data:
python scripts/fetch_voices.pyThis step fetches the required voices.json data file, which is necessary for voice synthesis.
- Build the project:
cargo build --release./target/release/koko -h./target/release/koko text "Hello, this is a TTS test"
The generated audio will be saved to tmp/output.wav by default. You can customize the save location with the --output or -o option:
./target/release/koko text "I hope you're having a great day today!" --output greeting.wav
./target/release/koko file poem.txt
For a file with 3 lines of text, by default, speech audio files tmp/output_0.wav, tmp/output_1.wav, tmp/output_2.wav will be outputted. You can customize the save location with the --output or -o option, using {line} as the line number:
./target/release/koko file lyrics.txt -o "song/lyric_{line}.wav"
- Start the server:
./target/release/koko openai- Make API requests using either curl or Python:
Using curl:
curl -X POST http://localhost:3000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "anything can go here",
"input": "Hello, this is a test of the Kokoro TTS system!",
"voice": "af_sky"
}'
--output sky-says-hello.wavUsing Python:
python scripts/run_openai.pyThe stream option will start the program, reading for lines of input from stdin and outputting WAV audio to stdout.
Use it in conjunction with piping.
./target/release/koko stream > live-audio.wav
# Start typing some text to generate speech for and hit enter to submit
# Speech will append to `live-audio.wav` as it is generated
# Hit Ctrl D to exit
echo "Suppose some other program was outputting lines of text" | ./target/release/koko stream > programmatic-audio.wav
- Build the image
docker build -t kokoros .- Run the image, passing options as described above
# Basic text to speech
docker run -v ./tmp:/app/tmp kokoros text "Hello from docker!" -o tmp/hello.wav
# An OpenAI server (with appropriately bound port)
docker run -p 3000:3000 kokoros openaiDue to Kokoro actually not finalizing it's ability, this repo will keep tracking the status of Kokoro, and helpfully we can have language support incuding: English, Mandarin, Japanese, German, French etc.
Copyright reserved by Lucas Jin under Apache License.
