Releases: EmZod/speak
Releases · EmZod/speak
v1.1.0
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.
[Unreleased]
[1.1.0] - 2025-12-31
A complete rewrite of the streaming architecture for reliability, plus new features for handling long documents.
Added
Streaming Architecture (v1.0 Core)
- Binary streaming protocol for gapless audio playback (eliminates file I/O between Python and TypeScript)
- Ring buffer implementation (
src/audio/ring-buffer.ts) for smooth audio delivery - Explicit state machine (
src/streaming/state-machine.ts) for streaming: IDLE → BUFFERING → PLAYING → DRAINING → FINISHED - Pull-based audio player (
src/audio/stream-player.ts) using node-speaker (replaces afplay subprocess spawning) - Binary protocol reader (
src/bridge/binary-reader.ts) for efficient chunk parsing - Stream orchestrator (
src/streaming/orchestrator.ts) coordinating generation, buffering, and playback
Operational Infrastructure
- Killswitch system (
~/.chatter/.killswitch) for emergency stops - Structured decision logging with
logDecision()for debugging critical paths - Comprehensive health check system (
speak health) with JSON output option - Server auto-shutdown after 1 hour of idle (only TTS operations reset timer)
Long Document Support (v1.1)
- Auto-chunking for long documents (
--auto-chunk,--chunk-size) - splits at sentence boundaries - Resume capability for interrupted generations (
--resume <manifest>,--keep-chunks) - Generation manifest (
src/core/manifest.ts) tracking chunk status for reliable recovery - Progressive chunk saving to disk during generation (partial output preserved on timeout/error)
- Configurable timeout (
--timeout <seconds>, default 300s, 0 for unlimited)
Batch Processing
- Batch mode for multiple input files (
--output-dir,--skip-existing,--stop-on-error) - Batch utilities (
src/core/batch.ts) for preparing inputs and summarizing results
New Commands and Options
speak concat <files...> --out <output>- Concatenate audio files using soxspeak health- System health check with pass/fail status--estimate- Show duration estimate without generating--dry-run- Preview what would happen without generating- Progress indicator showing chunk counts and ETA during generation
Documentation
- SKILL.md for agent-facing documentation (simple, opinionated interface)
- Updated README with global installation instructions
Changed
--outputnow accepts both file paths (with .wav extension) and directories- Version bumped from 0.1.0 to 1.1.0
- Default timeout reduced from unlimited to 300 seconds
Fixed
Streaming Bugs (v1.0)
- Buffer view corruption:
Buffer.from(arrayBuffer)created views that got corrupted when underlying Float32Array was reused. Fixed by allocating new buffers and copying data for each audio chunk push. - Socket buffer reuse: Bun may reuse buffers passed to data event callbacks. Fixed by copying socket data immediately in
binary-reader.ts. - Short text streaming failure: State machine went BUFFERING → DRAINING without starting player. Fixed by adding
player.start()call when entering DRAINING from BUFFERING. - Streaming success=false: After
waitForFinish(), no BUFFER_EMPTY event was dispatched, leaving state as DRAINING. Fixed by dispatching BUFFER_EMPTY after playback completes. - Streaming hang on long content:
for awaitloop blocked whilehandleChunk()waited for buffer drain. Fixed with producer-consumer pattern for concurrent socket reading and chunk processing.
v1.1 Bug Fixes (found during testing)
- Process hanging after completion:
socket.end()doesn't force close the socket. Changed tosocket.destroy()inclient.ts. - Concat --output option conflict:
-o, --outputconflicted with parent command. Renamed to--outfor concat subcommand. - --resume requiring input text: Resume handling was after input validation. Moved to beginning of action handler.
Technical Details
Architecture Changes
BEFORE (v0.1):
Bun CLI → Bridge → Python → afplay (subprocess per chunk)
- Three processes own playback state
- File I/O for every chunk
- 50-100ms gaps between chunks
AFTER (v1.0+):
Bun CLI (state machine + ring buffer + speaker) ← Binary stream ← Python
- Single owner for playback state
- No file I/O for streaming
- Gapless audio playback
State Machine States
- IDLE: Initial state, waiting to start
- BUFFERING: Accumulating initial buffer before playback (3s default)
- PLAYING: Actively playing audio
- REBUFFERING: Paused playback due to low buffer, waiting for more data
- DRAINING: Generation complete, playing remaining buffer
- FINISHED: Playback complete
- ERROR: Error occurred
New Files
src/audio/ring-buffer.ts - Lock-free audio sample ring buffer
src/audio/stream-player.ts - Streaming audio player using node-speaker
src/audio/device.ts - Audio device detection
src/streaming/state-machine.ts - Streaming state machine
src/streaming/orchestrator.ts - Stream coordination
src/bridge/binary-reader.ts - Binary protocol reader
src/bridge/binary-protocol.ts - Binary protocol types
src/core/killswitch.ts - Emergency stop mechanism
src/core/health.ts - Health check system
src/core/chunker.ts - Text chunking at sentence boundaries
src/core/concatenate.ts - Sox-based audio concatenation
src/core/manifest.ts - Generation state tracking for resume
src/core/batch.ts - Batch processing utilities
src/core/estimate.ts - Duration estimation
src/python/binary_protocol.py - Python binary protocol writer
Dependencies
Added
speaker^0.5.4 - Node.js native audio playback
System Requirements
soxrequired for auto-chunking and concat:brew install soxportaudiorequired for node-speaker:brew install portaudio
[0.1.0] - 2025-12-26
Added
- Initial release
- Text-to-speech using Chatterbox TTS on Apple Silicon
- Daemon mode for faster subsequent generations
- Voice cloning with
--voice <sample.wav> - Markdown processing modes (plain/smart)
- Code block handling (read/skip/placeholder)
- Auto-setup on first run
- Shell completions (bash, zsh, fish)
- Emotion tags support ([laugh], [sigh], etc.)
Known Issues
- Streaming mode has audio gaps between chunks (fixed in v1.0)
- No progress indicator for long files (fixed in v1.1)
- No resume capability for interrupted generations (fixed in v1.1)