Skip to content

Add Filler Word Removal Option to Post-Processing #3

@eddmann

Description

@eddmann

Feature Request: Filler Word Removal

Summary

Add an optional filler word removal feature to post-processing that removes common speech disfluencies (um, uh, ah, like, you know, etc.) from transcriptions.

Motivation

When transcribing natural speech, filler words and hesitations are captured verbatim, which can make transcriptions harder to read and less professional. Users should have the option to automatically clean these up during post-processing.

Use Cases

  • Professional Documentation: Cleaning up meeting notes, interviews, or dictation for formal documents
  • Content Creation: Preparing podcast/video transcripts for publication
  • Accessibility: Creating cleaner captions for videos
  • Note-taking: Making voice notes more readable and concise

Proposed Solution

1. Add Filler Removal Toggle in Settings

Add a new toggle in the Settings → Service tab under post-processing options:

  • For WhisperKit: "Remove Filler Words" (below MLX post-processing toggle)
  • For OpenAI: "Remove Filler Words" (below GPT-4o-mini post-processing toggle)

Store preference in UserDefaults:

  • whisperkit_remove_fillers (Bool)
  • openai_remove_fillers (Bool)

2. Implementation Approaches

Option A: Enhanced Post-Processing Prompts (Recommended)

  • Update MLXService and OpenAIService post-processing prompts to include filler removal
  • Prompt addition: "Remove filler words such as 'um', 'uh', 'ah', 'like', 'you know', while preserving the meaning and natural flow."
  • Pros: Leverages existing LLM capabilities, contextually aware
  • Cons: Requires post-processing to be enabled, adds slight processing time

Option B: Regex-Based Pre-Processing

  • Create new FillerRemovalService that uses pattern matching
  • Apply before or after main post-processing
  • Common patterns: \b(um|uh|ah|hmm|er|like)\b, you know, kind of, sort of, etc.
  • Pros: Fast, works without post-processing, predictable
  • Cons: May remove valid uses (e.g., "I like pizza"), less context-aware

Option C: Hybrid Approach

  • Simple regex for obvious fillers (um, uh, ah, er, hmm)
  • LLM post-processing for contextual fillers (like, you know, kind of)
  • Best balance of speed and accuracy

3. Filler Words to Target

Common disfluencies:

  • um, uh, ah, er, hmm, uhh, umm, ahh, err
  • like (when used as filler, not verb)
  • you know, I mean, well
  • kind of, sort of, basically
  • actually (when overused)
  • so (sentence starter filler)

Considerations:

  • Preserve meaning: "I like cats" should NOT become "I cats"
  • Context matters: "Um... no" might be better as just "No"
  • Repeated words: "the the door" → "the door"

4. UI Mockup

Settings → Service → WhisperKit:
┌─────────────────────────────────────────┐
│ ☑ Enable AI Post-Processing             │
│   MLX Model: Llama 3.2 3B [▼]           │
│   ☑ Remove Filler Words                 │
│                                          │
│ Note: Filler removal removes speech     │
│ disfluencies like "um", "uh", "like".   │
└─────────────────────────────────────────┘

Technical Considerations

  1. Dependency: Should filler removal require post-processing to be enabled, or work independently?
  2. Order of Operations:
    • Transcribe → Remove fillers → Post-process formatting?
    • Or: Transcribe → Post-process (including filler removal)?
  3. Customization: Should users be able to customize which words are considered fillers?
  4. Performance: Regex is fast; LLM-based is slower but smarter
  5. Accuracy: How to handle edge cases without changing intended meaning?

Implementation Tasks

  • Decide on implementation approach (A, B, or C)
  • Add filler removal toggles to SettingsView
  • Update MLXService post-processing prompt (if Option A)
  • Update OpenAIService post-processing prompt (if Option A)
  • Create FillerRemovalService (if Option B or C)
  • Add UserDefaults keys for settings persistence
  • Update AppState to pass filler removal preference to services
  • Add unit tests for filler removal logic
  • Update CLAUDE.md and README.md with new feature
  • Test with various speech patterns and accents

Related Files

  • Services/MLXService.swift - MLX post-processing
  • Services/OpenAIService.swift - OpenAI post-processing
  • UI/Settings/SettingsView.swift - Settings UI
  • Core/Application/AppState.swift - State management

Questions for Discussion

  1. Should this be a separate feature or integrated into existing post-processing?
  2. What's the preferred implementation approach?
  3. Should there be a "Custom Filler Words" input field for power users?
  4. Should we show before/after comparison in the UI?

Labels: enhancement, feature-request, post-processing
Priority: Medium
Complexity: Medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions