A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
-
Updated
Oct 2, 2025 - Python
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and Microsoft VibeVoice with unlimited text length, SRT timing, Character support, Audio Analyzer, Silent Speech Analyzer, audio edit and more
Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.
Real-Time Deepfake Pipeline
Music Generation Using Deep Learning🎶🎵
AI Voice Agents: Exploring the Next Generation of Human-Machine Interaction! 🎙️🤖🎧
AudioInsight is a web application that processes audio, generates transcriptions, and allows users to ask questions about the related audio.
An approach to Andrej Karpathy's LLM challenge, as outlined here: https://twitter.com/karpathy/status/1760740503614836917
Maya Voice AI is an open-source project that demonstrates the Maya1 model, capable of generating realistic voice audio from text input with rich emotional and descriptive control. This repository provides a demo for text-to-speech synthesis using advanced language models and the SNAC codec, focusing on high-quality audio at 24kHz.
Professional Yocto BSP Layer for Dynamic Devices Edge Computing Platforms - AI Audio Processing, E-Ink Displays, Power Management, Wireless Connectivity, i.MX8MM/i.MX93 Support
AI Audio Framework 🎵
A project attempting to generate and extract features from music to make comparisons with popular artists, and examine where and with what demographics those artists are popular in order to craft a DIY marketing solution for aspiring artists.
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
Acoustic Space Analyzer AI Pro is a professional acoustic analysis tool that leverages artificial intelligence to generate optimized DSP processing chains for any acoustic environment. This innovative application combines real-time spectral analysis, 3D spatial scanning, and AI-powered audio processing to deliver precise acoustic corrections.
ComfyUI custom nodes for the Dia2 TTS model — generate speech, timestamps, and captions directly inside ComfyUI.
Open source AI speech generation solution
This repository implements Unsupervised Domain Adaptation using Gradient Reversal Layer with PaSST feature extractors for cross-device acoustic scene classification on DCASE TAU 2020 dataset.
Add a description, image, and links to the ai-audio topic page so that developers can more easily learn about it.
To associate your repository with the ai-audio topic, visit your repo's landing page and select "manage topics."