Skip to content

feat: tts documentation#11

Open
ghost wants to merge 3 commits intomainfrom
feat/tts-ii
Open

feat: tts documentation#11
ghost wants to merge 3 commits intomainfrom
feat/tts-ii

Conversation

@ghost
Copy link

@ghost ghost commented Mar 3, 2024

Summary by CodeRabbit

  • Documentation
    • Updated the guide for the Live Text To Speech API, covering WebSocket connection, voice configuration, text synthesis, language selection, and speaker voice options.
    • Added a new guide for the Pre Recorded Text To Speech API, detailing voice selection, cloning, language support, and audio output formats.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2024

Walkthrough

The recent update introduces comprehensive guides for utilizing two distinct Text To Speech APIs: the Live Text To Speech API and the Pre Recorded Text To Speech API. These additions cover everything from connecting to a WebSocket for live synthesis to selecting voices and languages for pre-recorded audio synthesis. This update enhances the flexibility and customization options available for users seeking high-quality, personalized audio outputs.

Changes

File Path Summary
.../live-text-to-speech.mdx Guide for utilizing the Live Text To Speech API, covering WebSocket connection, speaker voice configuration, text synthesis, voice selection, supported languages, and available speaker voices.
.../text-to-speech.mdx Guide on using the Pre Recorded Text To Speech API for audio synthesis from text, explaining voice cloning, pre-recorded voices, language support, and high-quality WAV/PCM audio output.

🐰✨
In a world of sound, where words take flight,
Two guides emerge, to make it just right.
Live or pre-recorded, voices ring clear,
In languages far and those that are near.
With every syllable, a rabbit's delight,
Crafting audio tales, by day and by night.
🎶🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 6

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 61581ef and 2bf19ca.
Files selected for processing (2)
  • chapters/text-to-speech-api/pages/live-text-to-speech.mdx (1 hunks)
  • chapters/text-to-speech-api/pages/text-to-speech.mdx (1 hunks)

Comment on lines +12 to +21
1. **Pre-recorded Speaker Voices**:
This mode uses a library of pre-recorded voices.
You can select a voice that suits your requirements and provide the text you wish to synthesize.
The API will then generate audio using the selected voice.
The available voices include but are not limited to Claribel Dervla, Daisy Studious, Gracie Wise, and many more.
Each voice has its unique tone and style, providing a range of options for your audio content.

2. **Cloned Speaker Voice**:
If you require a more personalized voice, this mode allows you to clone a specific voice from an audio file you provide.
This is particularly useful for creating a unique voice for your brand or for specific characters in storytelling applications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Modes of Operation" section clearly differentiates between using pre-recorded speaker voices and cloning a speaker voice. For the "Cloned Speaker Voice" mode, it would be helpful to include more details on how to provide the audio file for cloning, such as acceptable formats and how to encode or reference the file in the request.


### Request Format

To use the API, send a POST request to `/text/audio/audio-synthesis/` with a JSON payload specifying the `speaker_voice_behaviour` (either `"pre recorded voice"` or `"cloned voice"`), the chosen `pre_recorded_speaker_voice` or `cloned_speaker_voice` file (if applicable), the `text` to be synthesized, and the `language`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the "Request Format" section, the documentation provides a concise overview of the required JSON payload structure. To improve clarity, consider adding a brief description for each field in the payload, especially for fields like speaker_voice_behaviour, to explain the expected values and their effects.


### Response

The API responds with an audio file at 24,000 Hz in WAV/PCM format, allowing for high-quality audio output suitable for various applications, from virtual assistants to audio books.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Response" section clearly states the format and quality of the audio file returned by the API. For completeness, consider mentioning any potential error responses and their meanings to help users troubleshoot issues with their requests.

Comment on lines 32 to 49
Here is an example of payload for a pre recorded voice:
POST https://api.gladia.io/text/audio/audio-synthesis/
```json
{
"speaker_voice_behaviour": "pre recorded voice",
"pre_recorded_speaker_voice": "Claribel Dervla",
"text": "Hello, welcome to our Text to Speech API. This is an example using a pre-recorded speaker voice.",
"language": "en"
}
Here is an example of payload for a cloned speaker voice voice:
POST https://api.gladia.io/text/audio/audio-synthesis/
```json
{
"speaker_voice_behaviour": "cloned voice",
"cloned_speaker_voice": "file_path_to_cloned_voice_sample.wav",
"text": "This is an example using a cloned speaker voice.",
"language": "en"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Example" section provides useful payload examples for both pre-recorded and cloned speaker voices. To enhance this section, consider adding example responses, including both successful audio file returns and examples of error responses. This would provide a more comprehensive guide for users to understand the API's behavior.


### Supported Languages

The API supports a variety of languages, including english (en), spanish (es), french (fr), german (de), italian (it), portuguese (pt), polish (pl), turkish (tr), russian (ru), dutch (nl), czech (cs), arabic (ar), chinese (zh-cn), hungarian (hu), korean (ko), japanese (ja), and hindi (hi). This wide range of supported languages makes it easy to create audio content for a global audience.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Supported Languages" section is informative and mirrors the content in the live-text-to-speech.mdx file. Consistency between documents is good, but ensure that any updates to supported languages are reflected across all relevant documentation to maintain accuracy.

Comment on lines +58 to +108
The Live Text to Speech API offers a diverse range of speaker voices to choose from. Here is a list of available voices:

- Gitta Nikolina
- Henriette Usha
- Sofia Hellen
- Tammy Grit
- Tanja Adelina
- Vjollca Johnnie
- Andrew Chipper
- Badr Odhiambo
- Dionisio Schuyler
- Royston Min
- Viktor Eka
- Abrahan Mack
- Adde Michal
- Baldur Sanjin
- Craig Gutsy
- Damien Black
- Gilberto Mathias
- Ilkin Urbano
- Kazuhiko Atallah
- Ludvig Milivoj
- Suad Qasim
- Torcull Diarmuid
- Viktor Menelaos
- Zacharie Aimilios
- Nova Hogarth
- Maja Ruoho
- Uta Obando
- Lidiya Szekeres
- Chandra MacFarland
- Szofi Granger
- Camilla Holmström
- Lilya Stainthorpe
- Zofija Kendrick
- Narelle Moon
- Barbora MacLean
- Alexandra Hisakawa
- Alma María
- Rosemary Okafor
- Ige Behringer
- Filip Traverse
- Damjan Chapman
- Wulf Carlevaro
- Aaron Dreschner
- Kumar Dahl
- Eugenio Mataracı
- Ferran Simen
- Xavier Hayasaka
- Luis Moray
- Marcos Rudaski No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Available Speaker Voices" section, similar to the one in the live-text-to-speech.mdx file, lists the speaker voices. It's crucial to keep this list updated and consistent across all documentation. As previously mentioned, categorizing the voices could significantly improve user experience.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 2bf19ca and db9d931.
Files ignored due to path filters (1)
  • mint.json is excluded by: !**/*.json
Files selected for processing (2)
  • chapters/text-to-speech-api/pages/live-text-to-speech.mdx (1 hunks)
  • chapters/text-to-speech-api/pages/text-to-speech.mdx (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • chapters/text-to-speech-api/pages/live-text-to-speech.mdx
  • chapters/text-to-speech-api/pages/text-to-speech.mdx

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between db9d931 and b19d146.
Files selected for processing (1)
  • chapters/text-to-speech-api/pages/live-text-to-speech.mdx (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • chapters/text-to-speech-api/pages/live-text-to-speech.mdx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants