From e9b263cac2eb239d3d443dcd73985bf664bf534f Mon Sep 17 00:00:00 2001 From: karamouche Date: Wed, 11 Feb 2026 15:29:44 -0500 Subject: [PATCH 1/2] Add pii redaction documentation --- chapters/audio-intelligence/index.mdx | 8 +++ chapters/audio-intelligence/pii-redaction.mdx | 57 +++++++++++++++++++ chapters/pre-recorded-stt/features/index.mdx | 7 +++ docs.json | 1 + 4 files changed, 73 insertions(+) create mode 100644 chapters/audio-intelligence/pii-redaction.mdx diff --git a/chapters/audio-intelligence/index.mdx b/chapters/audio-intelligence/index.mdx index a147226..0c6e055 100644 --- a/chapters/audio-intelligence/index.mdx +++ b/chapters/audio-intelligence/index.mdx @@ -32,6 +32,14 @@ Use these capabilities alongside Live or Pre-recorded STT to automate workflows Detect and categorize key entities like people, organizations, dates, and more. + + Automatically redact names, emails, vehicle IDs, and other PII in pre-recorded transcripts. + + +```json Pre-recorded +{ + "audio_url": "YOUR_AUDIO_URL", + "pii_redaction": true +} +``` + + +## Optional configuration + +You can customize the behavior with `pii_redaction_config`: + + + Preset or list of PII entity types to redact (e.g. `["GDPR"]`). + See [Named Entity Recognition](/chapters/audio-intelligence/named-entity-recognition#supported-entities) for supported entity types. + + + How to replace detected PII: `"MARKER"` (placeholder labels like `[EMAIL_1]`) or `"MASK"` (masked characters). + + +## Example body + +```json +{ + "audio_url": "YOUR_AUDIO_URL", + "pii_redaction": true, + "pii_redaction_config": { + "entity_types": ["GDPR"], + "processed_text_type": "MARKER" + } +} +``` + +## Example output + +**Before (raw transcript):** + +> Hi, I'm calling about the order for John Smith. Can you confirm the delivery to john.smith@company.com? Yes, John Smith placed it yesterday. + +**After (with PII redaction):** + +> Hi, I'm calling about the order for [NAME_1]. Can you confirm the delivery to [EMAIL_1]? Yes, [NAME_1] placed it yesterday. + +The same entity mentioned multiple times receives the **same marker ID** (e.g. "John Smith" becomes [NAME_1] both times), so you can track references across the transcript while keeping sensitive data redacted. \ +This consistency is also useful for downstream tasks using LLMs, which can reason about entities (e.g. "the person in [NAME_1]") without ever seeing the raw PII. diff --git a/chapters/pre-recorded-stt/features/index.mdx b/chapters/pre-recorded-stt/features/index.mdx index 5d75942..88248c9 100644 --- a/chapters/pre-recorded-stt/features/index.mdx +++ b/chapters/pre-recorded-stt/features/index.mdx @@ -13,6 +13,13 @@ The core functionality of the Gladia API is its Speech Recognition model, design > Detect speakers and understand who said what, and when. + + Automatically redact names, emails, vehicle IDs, and other PII in pre-recorded transcripts. + Date: Wed, 11 Feb 2026 15:42:56 -0500 Subject: [PATCH 2/2] improved explanation on different processed text type for PII redaction --- chapters/audio-intelligence/pii-redaction.mdx | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/chapters/audio-intelligence/pii-redaction.mdx b/chapters/audio-intelligence/pii-redaction.mdx index 9589167..16f6187 100644 --- a/chapters/audio-intelligence/pii-redaction.mdx +++ b/chapters/audio-intelligence/pii-redaction.mdx @@ -26,8 +26,10 @@ You can customize the behavior with `pii_redaction_config`: Preset or list of PII entity types to redact (e.g. `["GDPR"]`). See [Named Entity Recognition](/chapters/audio-intelligence/named-entity-recognition#supported-entities) for supported entity types. - - How to replace detected PII: `"MARKER"` (placeholder labels like `[EMAIL_1]`) or `"MASK"` (masked characters). + + How to replace detected PII: + - **`MARKER`**: Placeholder labels like `[NAME_1]`, `[EMAIL_1]`. Same entity will have same ID. + - **`MASK`**: Each character replaced by a mask (e.g. "John Smith" → `#### #####`) ## Example body @@ -45,11 +47,15 @@ You can customize the behavior with `pii_redaction_config`: ## Example output -**Before (raw transcript):** +**Without PII redaction (raw transcript):** > Hi, I'm calling about the order for John Smith. Can you confirm the delivery to john.smith@company.com? Yes, John Smith placed it yesterday. -**After (with PII redaction):** +**With PII redaction (`processed_text_type="MASK"`):** + +> Hi, I'm calling about the order for #### #####. Can you confirm the delivery to ######################? Yes, #### ##### placed it yesterday. + +**With PII redaction (`processed_text_type="MARKER"`):** > Hi, I'm calling about the order for [NAME_1]. Can you confirm the delivery to [EMAIL_1]? Yes, [NAME_1] placed it yesterday.