Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions chapters/text-to-speech-api/pages/live-text-to-speech.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
title: Live Text To Speech
description: "Live Text To Speech API Guide"
---

## Connecting and Configuring

1. **Connection**: Establish a WebSocket (wss) connection to `/text/audio/audio-synthesis/`.
2. **Configuration**: Configure the stream by selecting a speaker voice.
3. **Data**: Send the actual text to synthesize

## Modes of Operation

### Pre-recorded Speaker Voices

Select from the list of available pre-recorded speaker voices for your text-to-speech synthesis.
wss://api.gladia.io/text/audio/audio-synthesis/
```json
{
"speaker_voice_behaviour": "pre recorded voice",
"pre_recorded_speaker_voice": "Select from the list above",
}
```

```json
{
"text": "Enter your text here",
"language": "Select your language (e.g., en, es, fr)"
}
```

### Cloned Speaker Voice

To utilize a cloned speaker voice for your text-to-speech synthesis, follow these steps:
wss://api.gladia.io/text/audio/audio-synthesis/
```json
{
"cloned_speaker_voice": "base64 audio content",
"cloned_speaker_voice_sample_rate" : 44100,
}
```
```json
{
"text": "Enter your text here",
"language": "Select your language (e.g., en, es, fr)"
}
```
## Supported Languages

The API supports a variety of languages, including
- english (en),
- spanish (es),
- french (fr),
- german (de),
- italian (it),
- portuguese (pt),
- polish (pl),
- turkish (tr),
- russian (ru),
- dutch (nl),
- czech (cs),
- arabic (ar),
- chinese (zh-cn),
- hungarian (hu),
- korean (ko),
- japanese (ja),
- hindi (hi).
This wide range of supported languages makes it easy to create audio content for a global audience.


## Available Speaker Voices

The Live Text to Speech API offers a diverse range of speaker voices to choose from. Here is a list of available voices:

- Gitta Nikolina
- Henriette Usha
- Sofia Hellen
- Tammy Grit
- Tanja Adelina
- Vjollca Johnnie
- Andrew Chipper
- Badr Odhiambo
- Dionisio Schuyler
- Royston Min
- Viktor Eka
- Abrahan Mack
- Adde Michal
- Baldur Sanjin
- Craig Gutsy
- Damien Black
- Gilberto Mathias
- Ilkin Urbano
- Kazuhiko Atallah
- Ludvig Milivoj
- Suad Qasim
- Torcull Diarmuid
- Viktor Menelaos
- Zacharie Aimilios
- Nova Hogarth
- Maja Ruoho
- Uta Obando
- Lidiya Szekeres
- Chandra MacFarland
- Szofi Granger
- Camilla Holmström
- Lilya Stainthorpe
- Zofija Kendrick
- Narelle Moon
- Barbora MacLean
- Alexandra Hisakawa
- Alma María
- Rosemary Okafor
- Ige Behringer
- Filip Traverse
- Damjan Chapman
- Wulf Carlevaro
- Aaron Dreschner
- Kumar Dahl
- Eugenio Mataracı
- Ferran Simen
- Xavier Hayasaka
- Luis Moray
- Marcos Rudaski




110 changes: 110 additions & 0 deletions chapters/text-to-speech-api/pages/text-to-speech.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: Pre Recorded Text To Speech
description: "Guide on using Pre Recorded Text To Speech API for audio synthesis from text."
---

This API allows you to synthesize audio from text using two different modes: with pre-recorded speaker voices or with a cloned speaker voice.
Depending on your needs, you can choose the voice that best fits your project from a wide selection of pre-recorded voices,
or you can opt to clone a specific voice using a provided audio file.

### Modes of Operation

1. **Pre-recorded Speaker Voices**:
This mode uses a library of pre-recorded voices.
You can select a voice that suits your requirements and provide the text you wish to synthesize.
The API will then generate audio using the selected voice.
The available voices include but are not limited to Claribel Dervla, Daisy Studious, Gracie Wise, and many more.
Each voice has its unique tone and style, providing a range of options for your audio content.

2. **Cloned Speaker Voice**:
If you require a more personalized voice, this mode allows you to clone a specific voice from an audio file you provide.
This is particularly useful for creating a unique voice for your brand or for specific characters in storytelling applications.
Comment on lines +12 to +21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Modes of Operation" section clearly differentiates between using pre-recorded speaker voices and cloning a speaker voice. For the "Cloned Speaker Voice" mode, it would be helpful to include more details on how to provide the audio file for cloning, such as acceptable formats and how to encode or reference the file in the request.


### Request Format

To use the API, send a POST request to `/text/audio/audio-synthesis/` with a JSON payload specifying the `speaker_voice_behaviour` (either `"pre recorded voice"` or `"cloned voice"`), the chosen `pre_recorded_speaker_voice` or `cloned_speaker_voice` file (if applicable), the `text` to be synthesized, and the `language`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the "Request Format" section, the documentation provides a concise overview of the required JSON payload structure. To improve clarity, consider adding a brief description for each field in the payload, especially for fields like speaker_voice_behaviour, to explain the expected values and their effects.


### Response

The API responds with an audio file at 24,000 Hz in WAV/PCM format, allowing for high-quality audio output suitable for various applications, from virtual assistants to audio books.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Response" section clearly states the format and quality of the audio file returned by the API. For completeness, consider mentioning any potential error responses and their meanings to help users troubleshoot issues with their requests.


### Example
Here is an example of payload for a pre recorded voice:
POST https://api.gladia.io/text/audio/audio-synthesis/
```json
{
"speaker_voice_behaviour": "pre recorded voice",
"pre_recorded_speaker_voice": "Claribel Dervla",
"text": "Hello, welcome to our Text to Speech API. This is an example using a pre-recorded speaker voice.",
"language": "en"
}
```
Here is an example of payload for a cloned speaker voice voice:
POST https://api.gladia.io/text/audio/audio-synthesis/
```json
{
"speaker_voice_behaviour": "cloned voice",
"cloned_speaker_voice": "file_path_to_cloned_voice_sample.wav",
"text": "This is an example using a cloned speaker voice.",
"language": "en"
}
```

### Supported Languages

The API supports a variety of languages, including english (en), spanish (es), french (fr), german (de), italian (it), portuguese (pt), polish (pl), turkish (tr), russian (ru), dutch (nl), czech (cs), arabic (ar), chinese (zh-cn), hungarian (hu), korean (ko), japanese (ja), and hindi (hi). This wide range of supported languages makes it easy to create audio content for a global audience.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Supported Languages" section is informative and mirrors the content in the live-text-to-speech.mdx file. Consistency between documents is good, but ensure that any updates to supported languages are reflected across all relevant documentation to maintain accuracy.



## Available Speaker Voices

The Live Text to Speech API offers a diverse range of speaker voices to choose from. Here is a list of available voices:

- Gitta Nikolina
- Henriette Usha
- Sofia Hellen
- Tammy Grit
- Tanja Adelina
- Vjollca Johnnie
- Andrew Chipper
- Badr Odhiambo
- Dionisio Schuyler
- Royston Min
- Viktor Eka
- Abrahan Mack
- Adde Michal
- Baldur Sanjin
- Craig Gutsy
- Damien Black
- Gilberto Mathias
- Ilkin Urbano
- Kazuhiko Atallah
- Ludvig Milivoj
- Suad Qasim
- Torcull Diarmuid
- Viktor Menelaos
- Zacharie Aimilios
- Nova Hogarth
- Maja Ruoho
- Uta Obando
- Lidiya Szekeres
- Chandra MacFarland
- Szofi Granger
- Camilla Holmström
- Lilya Stainthorpe
- Zofija Kendrick
- Narelle Moon
- Barbora MacLean
- Alexandra Hisakawa
- Alma María
- Rosemary Okafor
- Ige Behringer
- Filip Traverse
- Damjan Chapman
- Wulf Carlevaro
- Aaron Dreschner
- Kumar Dahl
- Eugenio Mataracı
- Ferran Simen
- Xavier Hayasaka
- Luis Moray
- Marcos Rudaski
Comment on lines +60 to +110
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Available Speaker Voices" section, similar to the one in the live-text-to-speech.mdx file, lists the speaker voices. It's crucial to keep this list updated and consistent across all documentation. As previously mentioned, categorizing the voices could significantly improve user experience.

7 changes: 7 additions & 0 deletions mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,13 @@
"chapters/speech-to-text-api/pages/speaker-recognition"
]
},
{
"group": "Text to Speech API",
"pages": [
"chapters/text-to-speech-api/pages/live-text-to-speech",
"chapters/text-to-speech-api/pages/text-to-speech"
]
},
{
"group": "Audio Intelligence",
"pages": [
Expand Down