Skip to content

Add Smart UI Auto-Detection, Dynamic Voice System, and .env Support#37

Open
kiralpoon wants to merge 34 commits intoNVIDIA:mainfrom
kiralpoon:pr-all-features
Open

Add Smart UI Auto-Detection, Dynamic Voice System, and .env Support#37
kiralpoon wants to merge 34 commits intoNVIDIA:mainfrom
kiralpoon:pr-all-features

Conversation

@kiralpoon
Copy link

This PR adds three quality-of-life improvements that simplify PersonaPlex setup and development workflow while maintaining full backward compatibility.

Features

1. Smart UI Auto-Detection

Problem: Developers had to manually specify --static client/dist flag to use custom UI builds.

Solution: Server automatically detects and serves custom UI from client/dist when available, falling back to default HuggingFace UI when not present.

Benefits:

  • Zero configuration - just build and run
  • Automatic fallback to default UI
  • Clear log messages showing which UI is being served
  • Optional manual override with --static flag

Logs example:

Found custom UI at .../client/dist, using it instead of default
static_path = /home/.../personaplex-blackwell/client/dist

2. Dynamic Custom Voice System

Problem: Adding custom voices required code changes and server restarts to make them available in the Web UI.

Solution: Automatic voice discovery system that scans voice directories and exposes all voices via /api/voices endpoint.

Benefits:

  • Drop voice files and restart - no code changes needed
  • Supports both HuggingFace cache and custom directories
  • CUSTOM_VOICE_DIR environment variable for custom locations
  • REST API for programmatic voice listing
  • Frontend automatically populates dropdown from API

Voice file support:

  • .pt files (voice embeddings) - appear in UI dropdown
  • .wav files (source audio) - used for generating embeddings

API endpoint:

curl http://localhost:8998/api/voices

Returns categorized voice list with metadata.

3. .env File Support

Problem: Users must export HF_TOKEN=... in every terminal session or configure huggingface-cli login globally.

Solution: Support for .env files using python-dotenv library.

Why .env is Better:

Feature export .env
Persistence Lost on terminal close Set once, works forever
Loading Manual before each run Automatic
Security Appears in shell history Never in command history
Project-specific Global to session Per-project isolation
Documentation No standard way .env.example template
Multiple vars Export each separately All in one file

Example comparison:

# Old way (export):
export HF_TOKEN=hf_xxxxx
export CUSTOM_VOICE_DIR=./my_voices
python -m moshi.server --ssl "$SSL_DIR"
# Must repeat every terminal session

# New way (.env):
cp .env.example .env
# Edit .env once, then just:
python -m moshi.server --ssl "$SSL_DIR"
# Works in every terminal forever

Changes

Backend:

  • moshi/moshi/server.py: UI auto-detection logic, voice discovery, .env loading
  • moshi/moshi/offline.py: .env loading, save-voice-embeddings support
  • moshi/moshi/voice_discovery.py: New VoiceDiscovery class for scanning voice files
  • moshi/pyproject.toml: Added python-dotenv dependency

Frontend:

  • client/src/hooks/useVoices.ts: New hook for fetching voices from API
  • client/src/components/ModelParams/ModelParams.tsx: Dynamic voice dropdown
  • client/src/pages/Queue/Queue.tsx: Integrated voice selection

Documentation:

  • .env.example: Template with HF_TOKEN and CUSTOM_VOICE_DIR
  • README.md: Updated with all three features
  • QUICKSTART.md: Fast setup guide for new users
  • FRONTEND_DEVELOPMENT.md: Frontend development workflow guide
  • TROUBLESHOOTING.md: Common issues and solutions
  • custom_voices/README.md: Custom voice creation guide

Backward Compatibility

✅ Fully backward compatible - no breaking changes:

  • All existing workflows continue to work
  • --static flag still works for manual override
  • export HF_TOKEN still works
  • huggingface-cli login still works
  • .env file is optional
  • Hardcoded voice prompts still work

Testing

UI Auto-Detection:

  • ✅ With client/dist present - serves custom UI
  • ✅ Without client/dist - falls back to HuggingFace UI
  • ✅ Manual --static override works
  • ✅ Log messages correctly indicate which UI is served

Voice System:

  • ✅ Pre-packaged voices (NATF, NATM, VARF, VARM) load correctly
  • ✅ Custom .pt voice files appear in dropdown
  • CUSTOM_VOICE_DIR environment variable works
  • /api/voices endpoint returns correct JSON
  • ✅ Frontend dropdown populates automatically
  • ✅ Voice selection persists across sessions

.env Support:

  • ✅ Server starts successfully with .env file
  • ✅ Server starts successfully without .env file (fallback)
  • ✅ HF_TOKEN loads from .env
  • ✅ CUSTOM_VOICE_DIR loads from .env
  • ✅ No warnings when .env is missing
  • ✅ No breaking changes to existing workflows

Statistics

  • 16 files changed: 1,323 insertions, 47 deletions
  • New files: 6 (voice_discovery.py, useVoices.ts, 4 documentation files)
  • Modified files: 10
  • New dependencies: 1 (python-dotenv)
  • Breaking changes: 0

Example Usage

UI Auto-Detection

# Build frontend
cd client && npm run build && cd ..

# Server auto-detects - no flag needed!
SSL_DIR=$(mktemp -d); python -m moshi.server --ssl "$SSL_DIR"
# Logs: "Found custom UI at .../client/dist, using it instead of default"

Custom Voices

# Add voice file
cp my_voice.wav custom_voices/

# Generate embeddings
python -m moshi.offline --voice-prompt "my_voice.wav" \
  --save-voice-embeddings --input-wav "assets/test/input_assistant.wav" --output-wav "/tmp/out.wav"

# Restart server - voice appears in UI automatically!

.env Setup

# One-time setup
cp .env.example .env
# Edit .env and add: HF_TOKEN=your_token_here

# All commands now work without export
python -m moshi.server --ssl "$SSL_DIR"

Related Issues

This PR addresses user experience friction mentioned in community discussions about:

  • Simplifying frontend development workflow
  • Making custom voices easier to add
  • Reducing setup friction for new users

All features have been tested and verified working. Ready for review and merge. 🚀

rajarshiroy-nvidia and others added 30 commits January 15, 2026 10:52
add containerized support for personaplex
Install build dependencies and libopus-dev to fix Docker build
- Add --save-voice-embeddings CLI flag to offline.py for generating
  custom voice prompt embeddings from WAV files
- Remove torch < 2.5 upper bound to allow PyTorch 2.10+ for RTX 5090
- Add missing pyloudnorm dependency required for audio normalization
- Update README with conda setup instructions, Blackwell GPU guide,
  and custom voice creation tutorial
- Update .gitignore for Claude Code local settings
…ads model layers to CPU when GPU memory is insufficient.
Add libopus-dev to installation prerequisites
fix: reduce memory need during model init
- Add python-dotenv dependency to pyproject.toml
- Load environment variables from .env file in server.py and offline.py
- Add warning when .env exists but HF_TOKEN is not set
- Create .env.example template for users
- Update README.md with .env configuration instructions

The .env file is optional and all existing workflows continue to work.
Users can now configure HF_TOKEN via .env file, environment variable,
or huggingface-cli login.
Implement dynamic voice discovery system that allows users to add custom
voices without modifying code. Voices automatically appear in the Web UI
dropdown after generating embeddings and restarting the server.

Backend changes:
- Add VoiceDiscovery service to scan configured voice directories
- Add /api/voices REST endpoint returning voice list with metadata
- Support custom voices directory (configurable via CUSTOM_VOICE_DIR)
- Only list .pt embedding files (not .wav source audio)

Frontend changes:
- Add useVoices React hook for dynamic voice fetching
- Update Queue and ModelParams components to use dynamic voice loading
- Add loading and error states for better UX
- Custom voices appear first in dropdown

Infrastructure:
- Add custom_voices/ directory with comprehensive README
- Update .gitignore to exclude voice files but keep directory structure
- Add TROUBLESHOOTING.md documenting common issues
- Update README.md with installation, server, and custom voice docs

Key fixes applied during implementation:
- Package must be installed in editable mode (pip install -e .) for dev
- Server needs --static client/dist flag to serve local frontend builds
- API routes must be registered before static routes in aiohttp
- Critical: Only .pt files are selectable voices (not .wav source files)
Server now automatically detects and serves custom UI from client/dist without requiring --static flag, simplifying development workflow. Falls back to HuggingFace default UI when custom build is unavailable. Adds comprehensive documentation including QUICKSTART.md and FRONTEND_DEVELOPMENT.md guides.
Implement dynamic voice discovery system that allows users to add custom
voices without modifying code. Voices automatically appear in the Web UI
dropdown after generating embeddings and restarting the server.

Backend changes:
- Add VoiceDiscovery service to scan configured voice directories
- Add /api/voices REST endpoint returning voice list with metadata
- Support custom voices directory (configurable via CUSTOM_VOICE_DIR)
- Only list .pt embedding files (not .wav source audio)

Frontend changes:
- Add useVoices React hook for dynamic voice fetching
- Update Queue and ModelParams components to use dynamic voice loading
- Add loading and error states for better UX
- Custom voices appear first in dropdown

Infrastructure:
- Add custom_voices/ directory with comprehensive README
- Update .gitignore to exclude voice files but keep directory structure
- Add TROUBLESHOOTING.md documenting common issues
- Update README.md with installation, server, and custom voice docs

Key fixes applied during implementation:
- Package must be installed in editable mode (pip install -e .) for dev
- Server needs --static client/dist flag to serve local frontend builds
- API routes must be registered before static routes in aiohttp
- Critical: Only .pt files are selectable voices (not .wav source files)
Server now automatically detects and serves custom UI from client/dist without requiring --static flag, simplifying development workflow. Falls back to HuggingFace default UI when custom build is unavailable. Adds comprehensive documentation including QUICKSTART.md and FRONTEND_DEVELOPMENT.md guides.
- Remove .env file configuration instructions
- Update to use export HF_TOKEN instead
- Change CUSTOM_VOICE_DIR docs to use export
- Keep only environment variable and huggingface-cli login methods
- Add practical examples for UI auto-detection
- Add example for custom voice workflow
- Show expected log output for verification
Update main branch with clean PR version:
- Smart UI auto-detection feature
- Dynamic custom voice system
- All .env references removed
- Complete documentation added
- Add .env.example template with HF_TOKEN and CUSTOM_VOICE_DIR
- Update documentation to recommend .env file as primary method
- .env files are automatically loaded by server and offline scripts
- Maintains backward compatibility with export and huggingface-cli methods

Benefits of .env over export will be explained in PR description.
- Ignore .agent/ directory
- Ignore Agents.md
- Ignore Claude.local.md

These are personal tooling files that should never be in the repository.
Resolved conflicts in README.md and moshi/moshi/offline.py:
- README.md: Preserved our three-option auth setup (.env, export, CLI) and auto-detection documentation
- offline.py: Kept save_embeddings feature (our addition) vs upstream's hardcoded False

Our additions preserved:
- .env file support for easier token management
- UI auto-detection with detailed documentation
- Custom voice system with dynamic loading
- save-voice-embeddings feature in offline mode
- QUICKSTART.md and FRONTEND_DEVELOPMENT.md guides
Successfully merged upstream/main with our feature branch containing:
- UI auto-detection system
- Dynamic custom voice loading
- .env file support for token management

All conflicts resolved, tests passed, ready for upstream PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants