Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.1.0] - 2025-12-31

A complete rewrite of the streaming architecture for reliability, plus new features for handling long documents.

Added

Streaming Architecture (v1.0 Core)

Binary streaming protocol for gapless audio playback (eliminates file I/O between Python and TypeScript)
Ring buffer implementation (src/audio/ring-buffer.ts) for smooth audio delivery
Explicit state machine (src/streaming/state-machine.ts) for streaming: IDLE → BUFFERING → PLAYING → DRAINING → FINISHED
Pull-based audio player (src/audio/stream-player.ts) using node-speaker (replaces afplay subprocess spawning)
Binary protocol reader (src/bridge/binary-reader.ts) for efficient chunk parsing
Stream orchestrator (src/streaming/orchestrator.ts) coordinating generation, buffering, and playback

Operational Infrastructure

Killswitch system (~/.chatter/.killswitch) for emergency stops
Structured decision logging with logDecision() for debugging critical paths
Comprehensive health check system (speak health) with JSON output option
Server auto-shutdown after 1 hour of idle (only TTS operations reset timer)

Long Document Support (v1.1)

Auto-chunking for long documents (--auto-chunk, --chunk-size) - splits at sentence boundaries
Resume capability for interrupted generations (--resume <manifest>, --keep-chunks)
Generation manifest (src/core/manifest.ts) tracking chunk status for reliable recovery
Progressive chunk saving to disk during generation (partial output preserved on timeout/error)
Configurable timeout (--timeout <seconds>, default 300s, 0 for unlimited)

Batch Processing

Batch mode for multiple input files (--output-dir, --skip-existing, --stop-on-error)
Batch utilities (src/core/batch.ts) for preparing inputs and summarizing results

New Commands and Options

speak concat <files...> --out <output> - Concatenate audio files using sox
speak health - System health check with pass/fail status
--estimate - Show duration estimate without generating
--dry-run - Preview what would happen without generating
Progress indicator showing chunk counts and ETA during generation

Documentation

SKILL.md for agent-facing documentation (simple, opinionated interface)
Updated README with global installation instructions

Changed

--output now accepts both file paths (with .wav extension) and directories
Version bumped from 0.1.0 to 1.1.0
Default timeout reduced from unlimited to 300 seconds

Fixed

Streaming Bugs (v1.0)

Buffer view corruption: Buffer.from(arrayBuffer) created views that got corrupted when underlying Float32Array was reused. Fixed by allocating new buffers and copying data for each audio chunk push.
Socket buffer reuse: Bun may reuse buffers passed to data event callbacks. Fixed by copying socket data immediately in binary-reader.ts.
Short text streaming failure: State machine went BUFFERING → DRAINING without starting player. Fixed by adding player.start() call when entering DRAINING from BUFFERING.
Streaming success=false: After waitForFinish(), no BUFFER_EMPTY event was dispatched, leaving state as DRAINING. Fixed by dispatching BUFFER_EMPTY after playback completes.
Streaming hang on long content: for await loop blocked while handleChunk() waited for buffer drain. Fixed with producer-consumer pattern for concurrent socket reading and chunk processing.

v1.1 Bug Fixes (found during testing)

Process hanging after completion: socket.end() doesn't force close the socket. Changed to socket.destroy() in client.ts.
Concat --output option conflict: -o, --output conflicted with parent command. Renamed to --out for concat subcommand.
--resume requiring input text: Resume handling was after input validation. Moved to beginning of action handler.

Technical Details

Architecture Changes

BEFORE (v0.1):
  Bun CLI → Bridge → Python → afplay (subprocess per chunk)
  - Three processes own playback state
  - File I/O for every chunk
  - 50-100ms gaps between chunks
  
AFTER (v1.0+):
  Bun CLI (state machine + ring buffer + speaker) ← Binary stream ← Python
  - Single owner for playback state
  - No file I/O for streaming
  - Gapless audio playback

State Machine States

IDLE: Initial state, waiting to start
BUFFERING: Accumulating initial buffer before playback (3s default)
PLAYING: Actively playing audio
REBUFFERING: Paused playback due to low buffer, waiting for more data
DRAINING: Generation complete, playing remaining buffer
FINISHED: Playback complete
ERROR: Error occurred

New Files

src/audio/ring-buffer.ts      - Lock-free audio sample ring buffer
src/audio/stream-player.ts    - Streaming audio player using node-speaker
src/audio/device.ts           - Audio device detection
src/streaming/state-machine.ts - Streaming state machine
src/streaming/orchestrator.ts  - Stream coordination
src/bridge/binary-reader.ts    - Binary protocol reader
src/bridge/binary-protocol.ts  - Binary protocol types
src/core/killswitch.ts        - Emergency stop mechanism
src/core/health.ts            - Health check system
src/core/chunker.ts           - Text chunking at sentence boundaries
src/core/concatenate.ts       - Sox-based audio concatenation
src/core/manifest.ts          - Generation state tracking for resume
src/core/batch.ts             - Batch processing utilities
src/core/estimate.ts          - Duration estimation
src/python/binary_protocol.py  - Python binary protocol writer

Dependencies

Added

speaker ^0.5.4 - Node.js native audio playback

System Requirements

sox required for auto-chunking and concat: brew install sox
portaudio required for node-speaker: brew install portaudio

[0.1.0] - 2025-12-26

Added

Initial release
Text-to-speech using Chatterbox TTS on Apple Silicon
Daemon mode for faster subsequent generations
Voice cloning with --voice <sample.wav>
Markdown processing modes (plain/smart)
Code block handling (read/skip/placeholder)
Auto-setup on first run
Shell completions (bash, zsh, fish)
Emotion tags support ([laugh], [sigh], etc.)

Known Issues

Streaming mode has audio gaps between chunks (fixed in v1.0)
No progress indicator for long files (fixed in v1.1)
No resume capability for interrupted generations (fixed in v1.1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Changelog

[Unreleased]

[1.1.0] - 2025-12-31

Added

Streaming Architecture (v1.0 Core)

Operational Infrastructure

Long Document Support (v1.1)

Batch Processing

New Commands and Options

Documentation

Changed

Fixed

Streaming Bugs (v1.0)

v1.1 Bug Fixes (found during testing)

Technical Details

Architecture Changes

State Machine States

New Files

Dependencies

Added

System Requirements

[0.1.0] - 2025-12-26

Added

Known Issues

Uh oh!

Releases: EmZod/speak

v1.1.0

Changelog

[Unreleased]

[1.1.0] - 2025-12-31

Added

Streaming Architecture (v1.0 Core)

Operational Infrastructure

Long Document Support (v1.1)

Batch Processing

New Commands and Options

Documentation

Changed

Fixed

Streaming Bugs (v1.0)

v1.1 Bug Fixes (found during testing)

Technical Details

Architecture Changes

State Machine States

New Files

Dependencies

Added

System Requirements

[0.1.0] - 2025-12-26

Added

Known Issues

Uh oh!