Add Smart UI Auto-Detection, Dynamic Voice System, and .env Support#37
Open
kiralpoon wants to merge 34 commits intoNVIDIA:mainfrom
Open
Add Smart UI Auto-Detection, Dynamic Voice System, and .env Support#37kiralpoon wants to merge 34 commits intoNVIDIA:mainfrom
kiralpoon wants to merge 34 commits intoNVIDIA:mainfrom
Conversation
add containerized support for personaplex
…x for Blackwell GPUs.
Install build dependencies and libopus-dev to fix Docker build
- Add --save-voice-embeddings CLI flag to offline.py for generating custom voice prompt embeddings from WAV files - Remove torch < 2.5 upper bound to allow PyTorch 2.10+ for RTX 5090 - Add missing pyloudnorm dependency required for audio normalization - Update README with conda setup instructions, Blackwell GPU guide, and custom voice creation tutorial - Update .gitignore for Claude Code local settings
…ads model layers to CPU when GPU memory is insufficient.
Add libopus-dev to installation prerequisites
Add low VRAM feature
fix: reduce memory need during model init
- Add python-dotenv dependency to pyproject.toml - Load environment variables from .env file in server.py and offline.py - Add warning when .env exists but HF_TOKEN is not set - Create .env.example template for users - Update README.md with .env configuration instructions The .env file is optional and all existing workflows continue to work. Users can now configure HF_TOKEN via .env file, environment variable, or huggingface-cli login.
Implement dynamic voice discovery system that allows users to add custom voices without modifying code. Voices automatically appear in the Web UI dropdown after generating embeddings and restarting the server. Backend changes: - Add VoiceDiscovery service to scan configured voice directories - Add /api/voices REST endpoint returning voice list with metadata - Support custom voices directory (configurable via CUSTOM_VOICE_DIR) - Only list .pt embedding files (not .wav source audio) Frontend changes: - Add useVoices React hook for dynamic voice fetching - Update Queue and ModelParams components to use dynamic voice loading - Add loading and error states for better UX - Custom voices appear first in dropdown Infrastructure: - Add custom_voices/ directory with comprehensive README - Update .gitignore to exclude voice files but keep directory structure - Add TROUBLESHOOTING.md documenting common issues - Update README.md with installation, server, and custom voice docs Key fixes applied during implementation: - Package must be installed in editable mode (pip install -e .) for dev - Server needs --static client/dist flag to serve local frontend builds - API routes must be registered before static routes in aiohttp - Critical: Only .pt files are selectable voices (not .wav source files)
Server now automatically detects and serves custom UI from client/dist without requiring --static flag, simplifying development workflow. Falls back to HuggingFace default UI when custom build is unavailable. Adds comprehensive documentation including QUICKSTART.md and FRONTEND_DEVELOPMENT.md guides.
Implement dynamic voice discovery system that allows users to add custom voices without modifying code. Voices automatically appear in the Web UI dropdown after generating embeddings and restarting the server. Backend changes: - Add VoiceDiscovery service to scan configured voice directories - Add /api/voices REST endpoint returning voice list with metadata - Support custom voices directory (configurable via CUSTOM_VOICE_DIR) - Only list .pt embedding files (not .wav source audio) Frontend changes: - Add useVoices React hook for dynamic voice fetching - Update Queue and ModelParams components to use dynamic voice loading - Add loading and error states for better UX - Custom voices appear first in dropdown Infrastructure: - Add custom_voices/ directory with comprehensive README - Update .gitignore to exclude voice files but keep directory structure - Add TROUBLESHOOTING.md documenting common issues - Update README.md with installation, server, and custom voice docs Key fixes applied during implementation: - Package must be installed in editable mode (pip install -e .) for dev - Server needs --static client/dist flag to serve local frontend builds - API routes must be registered before static routes in aiohttp - Critical: Only .pt files are selectable voices (not .wav source files)
Server now automatically detects and serves custom UI from client/dist without requiring --static flag, simplifying development workflow. Falls back to HuggingFace default UI when custom build is unavailable. Adds comprehensive documentation including QUICKSTART.md and FRONTEND_DEVELOPMENT.md guides.
- Remove .env file configuration instructions - Update to use export HF_TOKEN instead - Change CUSTOM_VOICE_DIR docs to use export - Keep only environment variable and huggingface-cli login methods
- Add practical examples for UI auto-detection - Add example for custom voice workflow - Show expected log output for verification
Update main branch with clean PR version: - Smart UI auto-detection feature - Dynamic custom voice system - All .env references removed - Complete documentation added
- Add .env.example template with HF_TOKEN and CUSTOM_VOICE_DIR - Update documentation to recommend .env file as primary method - .env files are automatically loaded by server and offline scripts - Maintains backward compatibility with export and huggingface-cli methods Benefits of .env over export will be explained in PR description.
- Ignore .agent/ directory - Ignore Agents.md - Ignore Claude.local.md These are personal tooling files that should never be in the repository.
Resolved conflicts in README.md and moshi/moshi/offline.py: - README.md: Preserved our three-option auth setup (.env, export, CLI) and auto-detection documentation - offline.py: Kept save_embeddings feature (our addition) vs upstream's hardcoded False Our additions preserved: - .env file support for easier token management - UI auto-detection with detailed documentation - Custom voice system with dynamic loading - save-voice-embeddings feature in offline mode - QUICKSTART.md and FRONTEND_DEVELOPMENT.md guides
Successfully merged upstream/main with our feature branch containing: - UI auto-detection system - Dynamic custom voice loading - .env file support for token management All conflicts resolved, tests passed, ready for upstream PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds three quality-of-life improvements that simplify PersonaPlex setup and development workflow while maintaining full backward compatibility.
Features
1. Smart UI Auto-Detection
Problem: Developers had to manually specify
--static client/distflag to use custom UI builds.Solution: Server automatically detects and serves custom UI from
client/distwhen available, falling back to default HuggingFace UI when not present.Benefits:
--staticflagLogs example:
2. Dynamic Custom Voice System
Problem: Adding custom voices required code changes and server restarts to make them available in the Web UI.
Solution: Automatic voice discovery system that scans voice directories and exposes all voices via
/api/voicesendpoint.Benefits:
CUSTOM_VOICE_DIRenvironment variable for custom locationsVoice file support:
.ptfiles (voice embeddings) - appear in UI dropdown.wavfiles (source audio) - used for generating embeddingsAPI endpoint:
Returns categorized voice list with metadata.
3. .env File Support
Problem: Users must
export HF_TOKEN=...in every terminal session or configurehuggingface-cli loginglobally.Solution: Support for
.envfiles usingpython-dotenvlibrary.Why .env is Better:
Example comparison:
Changes
Backend:
moshi/moshi/server.py: UI auto-detection logic, voice discovery, .env loadingmoshi/moshi/offline.py: .env loading, save-voice-embeddings supportmoshi/moshi/voice_discovery.py: New VoiceDiscovery class for scanning voice filesmoshi/pyproject.toml: Added python-dotenv dependencyFrontend:
client/src/hooks/useVoices.ts: New hook for fetching voices from APIclient/src/components/ModelParams/ModelParams.tsx: Dynamic voice dropdownclient/src/pages/Queue/Queue.tsx: Integrated voice selectionDocumentation:
.env.example: Template with HF_TOKEN and CUSTOM_VOICE_DIRREADME.md: Updated with all three featuresQUICKSTART.md: Fast setup guide for new usersFRONTEND_DEVELOPMENT.md: Frontend development workflow guideTROUBLESHOOTING.md: Common issues and solutionscustom_voices/README.md: Custom voice creation guideBackward Compatibility
✅ Fully backward compatible - no breaking changes:
--staticflag still works for manual overrideexport HF_TOKENstill workshuggingface-cli loginstill works.envfile is optionalTesting
UI Auto-Detection:
client/distpresent - serves custom UIclient/dist- falls back to HuggingFace UI--staticoverride worksVoice System:
.ptvoice files appear in dropdownCUSTOM_VOICE_DIRenvironment variable works/api/voicesendpoint returns correct JSON.env Support:
Statistics
Example Usage
UI Auto-Detection
Custom Voices
.env Setup
Related Issues
This PR addresses user experience friction mentioned in community discussions about:
All features have been tested and verified working. Ready for review and merge. 🚀