Skip to content

Conversation

@graniet
Copy link
Owner

@graniet graniet commented Jun 15, 2025

Adds first-class audio handling to the LLM stack and a runnable voice-assistant example:

  • New MessageType::Audio with helpers (has_audio, audio_data) and a cond!(has_audio) macro.
  • AgentBuilder / ChatWithMemory now accept an optional STT provider and launch an async transcription pipeline using Whisper.
  • Back-end updates (OpenAI, Anthropic, Google, Azure) gracefully ignore audio when building API payloads.
  • Memory tweaks: audio messages are filtered out of sliding-window recall to keep prompts clean.
  • Example agent_audio_example.rs: records PCM via cpal, encodes WAV with hound, shows live conversation in a ratatui/crossterm TUI, and drives a two-agent (transcriber → assistant) workflow.
  • Feature-gated dependencies (cpal, hound, ratatui, crossterm, anyhow, parking_lot) under audio-example to avoid bloat for non-audio builds.

@graniet graniet changed the title Add speech to text feature for agent builder Add voice-assistant support & TUI demo Jun 15, 2025
@reneleonhardt
Copy link

Wow, great work @graniet 🚀
What is missing? Could a CI workflow be added to check for regressions?

@graniet graniet requested a review from Copilot June 21, 2025 18:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds voice-assistant support and a terminal UI demo to the LLM stack. Key changes include:

  • Introducing a new audio message type with helper methods and a cond!(has_audio) macro.
  • Enhancing memory and chat wrappers to filter out and transcribe audio messages by integrating an optional STT provider.
  • Adding a runnable TUI-based voice-assistant example along with feature-gated audio dependencies.

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/memory/sliding_window.rs Filters out audio messages from the sliding window memory recall.
src/memory/mod.rs & cond_macros.rs Adds the new HasAudio condition and helper documentation.
src/memory/chat_wrapper.rs Integrates an STT provider to spawn a transcription pipeline for audio messages.
src/chat/mod.rs Introduces new methods to work with the Audio messageType for audio processing.
src/builder.rs; src/agent/builder.rs Updates agent and builder constructions to support optional STT providers.
src/backends/* Updates backend implementations to gracefully ignore audio messages.
examples/agent_audio_example.rs Provides a complete demo integrating recording, transcription, audio processing, and terminal UI.
Cargo.toml Adds optional dependencies for audio processing and the TUI example.
Comments suppressed due to low confidence (1)

src/memory/sliding_window.rs:182

  • Consider adding a comment to explain why audio messages are filtered out in the sliding window recall, to clarify the intent for future maintainers.
        messages.retain(|m| !m.has_audio());


let mut guard = memory.write().await;
if let Err(e) = guard.remember_with_role(&transcribed_msg, event.role.clone()).await {
eprintln!("STT memory save error: {}", e);
Copy link

Copilot AI Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing eprintln with a proper logging framework to ensure consistent error handling across production code.

Copilot uses AI. Check for mistakes.
Comment on lines +273 to +275
writer.write_sample(s).unwrap();
}
writer.finalize().unwrap();
Copy link

Copilot AI Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider handling errors explicitly when writing WAV samples instead of using unwrap to prevent potential panics during audio processing.

Suggested change
writer.write_sample(s).unwrap();
}
writer.finalize().unwrap();
if let Err(e) = writer.write_sample(s) {
eprintln!("Error writing WAV sample: {e}");
return Vec::new(); // Return an empty buffer on error
}
}
if let Err(e) = writer.finalize() {
eprintln!("Error finalizing WAV file: {e}");
return Vec::new(); // Return an empty buffer on error
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants