From 4a78423bd6121c4902ae56180e7075baf9e65aeb Mon Sep 17 00:00:00 2001 From: karamouche Date: Wed, 28 Jan 2026 14:47:48 -0500 Subject: [PATCH 1/2] feat: add dedicated page to explain endpointing --- chapters/live-stt/features/endpointing.mdx | 47 ++++++++++++++++++++++ chapters/live-stt/features/index.mdx | 8 ++++ docs.json | 1 + 3 files changed, 56 insertions(+) create mode 100644 chapters/live-stt/features/endpointing.mdx diff --git a/chapters/live-stt/features/endpointing.mdx b/chapters/live-stt/features/endpointing.mdx new file mode 100644 index 0000000..18eb594 --- /dev/null +++ b/chapters/live-stt/features/endpointing.mdx @@ -0,0 +1,47 @@ +--- +title: "Endpointing" +description: "What's endpointing and how it works" +--- + +Endpointing is the mechanism Gladia uses in live transcription to decide when a speaker has "finished" an utterance, so the API can close that utterance and emit a final transcript segment. + +In practice, endpointing answers the question: "How much silence should we wait before we consider the sentence (or turn) complete?" + +### Why endpointing matters + +Endpointing is one of the main knobs that controls the tradeoff between: +- **Latency (speed)**: how quickly you get final utterances +- **Completeness**: whether you avoid cutting someone off mid-thought +- **Chunking quality**: whether utterances align well with natural turns or sentences + +Lower endpointing values feel "snappier" (great for voice agents), while higher values tend to produce cleaner, more complete segments (great for meetings and lectures). + +### How it works conceptually + +During a live session, Gladia continuously analyzes the incoming audio stream and: +1. Detects speech activity on each channel (voice activity detection) +2. Groups speech into an "utterance" while speech is ongoing +3. When it observes silence lasting at least endpointing seconds, it considers the utterance finished and closes it (finalizes it). +4. The AI model is then used to transcribe the final result of the utterance. +5. If speech never pauses long enough, Gladia still has a safety mechanism to close the utterance (*maximum_duration_without_endpointing*, see next section) + +You can also subscribe to speech activity messages to know when speech starts and ends (useful to drive UI or agent turn-taking) + +### The 2 key parameters + +**endpointing (seconds)** \ +Definition: the duration of silence that closes the current utterance. +- Default: 0.05 +- Range: 0.01 to 10 + +Effect: +- Smaller value = closes utterances faster, but can split sentences if the speaker hesitates briefly. +- Larger value = waits longer before finalizing, which improves segment completeness but increases latency. + +**maximum_duration_without_endpointing (seconds)** + +Definition: maximum amount of time Gladia will keep an utterance open without detecting endpointing silence. If that limit is reached, the utterance is considered finished anyway. +- Default: 5 +- Range: 5 to 60 + +Why it exists: it prevents extremely long, never-ending utterances (for example: constant background noise, a speaker who never pauses, or long monologues), which is important for downstream UX and processing stability. diff --git a/chapters/live-stt/features/index.mdx b/chapters/live-stt/features/index.mdx index 8b3287d..05e1bc5 100644 --- a/chapters/live-stt/features/index.mdx +++ b/chapters/live-stt/features/index.mdx @@ -7,6 +7,14 @@ description: "Core features of Gladia's real-time speech-to-text (STT) API" + + Control how long to wait for silence before closing an utterance. + + Date: Fri, 30 Jan 2026 10:35:15 -0500 Subject: [PATCH 2/2] endpointing page : add links to speech detection events --- chapters/live-stt/features/endpointing.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/live-stt/features/endpointing.mdx b/chapters/live-stt/features/endpointing.mdx index 18eb594..b7a2c0f 100644 --- a/chapters/live-stt/features/endpointing.mdx +++ b/chapters/live-stt/features/endpointing.mdx @@ -25,7 +25,7 @@ During a live session, Gladia continuously analyzes the incoming audio stream an 4. The AI model is then used to transcribe the final result of the utterance. 5. If speech never pauses long enough, Gladia still has a safety mechanism to close the utterance (*maximum_duration_without_endpointing*, see next section) -You can also subscribe to speech activity messages to know when speech starts and ends (useful to drive UI or agent turn-taking) +You can also subscribe to speech activity messages to know when speech [starts](https://docs.gladia.io/api-reference/v2/live/callback/speech-start) and [ends](https://docs.gladia.io/api-reference/v2/live/callback/speech-end) (useful to drive UI or agent turn-taking) ### The 2 key parameters