This project is a simple, user-friendly voice conversion tool. Aiming to produce high-quality voice conversion products with optimal performance, the project allows users to change voices smoothly and naturally.
-
Music Source Separation (MDX-Net / Demucs / VR)
-
Voice Conversion (Single File Conversion / Batch Conversion / Conversion with Whisper / Text Conversion)
-
Apply Audio Effects
-
Generate Training Data (From URL)
-
Model Training (v1 / v2, high-quality encoder, power training)
-
Model Fusion
-
Read Model Information
-
Export Model to ONNX
-
Download from Available Model Hub
-
Search Models from Web
-
Pitch Extraction
-
Support ONNX Model Inference for Audio Conversion
-
ONNX RVC models will also support index inference
-
Real-time Voice Conversion
-
Generate Training Reference
Pitch Extraction Methods: pm-ac, pm-cc, pm-shs, dio, mangio-crepe-tiny, mangio-crepe-small, mangio-crepe-medium, mangio-crepe-large, mangio-crepe-full, crepe-tiny, crepe-small, crepe-medium, crepe-large, crepe-full, fcpe, fcpe-legacy, fcpe-previous, rmvpe, rmvpe-clipping, rmvpe-medfilt, rmvpe-clipping-medfilt, harvest, yin, pyin, swipe, piptrack, penn, mangio-penn, djcm, djcm-clipping, djcm-medfilt, djcm-clipping-medfilt, swift, pesto
Embedding Extraction Models: contentvec_base, hubert_base, vietnamese_hubert_base, japanese_hubert_base, korean_hubert_base, chinese_hubert_base, portuguese_hubert_base, spin-v1, spin-v2, whisper-tiny, whisper-tiny.en, whisper-base, whisper-base.en, whisper-small, whisper-small.en, whisper-medium, whisper-medium.en, whisper-large-v1, whisper-large-v2, whisper-large-v3, whisper-large-v3-turbo
- Embedding extraction models have available modes such as: fairseq, onnx, transformers, spin, whisper.
- All pitch extraction models have ONNX accelerated versions except for methods operating via wrapper.
- Pitch extraction models can be combined with weights to create new sensations, for example:
hybrid[rmvpe+harvest].
Will be added if I actually have free time...
Step 1: Install necessary dependencies
- Install Python from the official site: PYTHON (Project tested on Python 3.10.x and 3.11.x)
- Install FFmpeg from source and add to system PATH: FFMPEG
Step 2: Install the project (Using Git or simply download from GitHub)
For Git:
- git clone https://github.com/terastudio-org/TeraStudio-RVC
- cd Vietnamese-RVC
Install via GitHub:
- Go to https://github.com/terastudio-org/TeraStudio-RVC
- Click the green
<> Codebutton and selectDownload ZIP - Extract
TeraStudio-RVC-main.zip - Navigate to the TeraStudio-RVC-main folder, type
cmdin the address bar and press Enter
Step 3: Install required libraries:
Enter the command:
python -m venv env
env\\Scripts\\activate
python -m pip install uv
uv pip install six packaging python-dateutil platformdirs pywin32 onnxconverter_common wget
Installation for different devices
For CPU
uv pip install -r requirements.txt
For CUDA
Can replace cu118 with newer cu128 if GPU supports:
uv pip install numpy==1.26.4 numba==0.61.0
uv pip install torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118
uv pip install -r requirements.txt
For OPENCL (AMD)
uv pip install numpy==1.26.4 numba==0.61.0
uv pip install torch==2.6.0 torchaudio==2.6.0 torchvision
uv pip install https://github.com/artyom-beilis/pytorch_dlprim/releases/download/0.2.0/pytorch_ocl-0.2.0+torch2.6-cp311-none-win_amd64.whl
uv pip install onnxruntime-directml
uv pip install -r requirements.txt
Note:
- It seems OPENCL support has been discontinued.
- Should only install on python 3.11 due to lack of builds for python 3.10 with torch 2.6.0.
- Demucs may cause overload and memory overflow on GPU (if you need to use demucs, open the config.json file in main\configs and change the demucs_cpu_mode argument to true).
- DDP does not support multi-GPU training for OPENCL.
- Some other algorithms must run on CPU so GPU performance may not be fully utilized.
For DIRECTML (AMD)
uv pip install numpy==1.26.4 numba==0.61.0
uv pip install torch==2.4.1 torchaudio==2.4.1 torchvision
uv pip install torch-directml==0.2.5.dev240914
uv pip install onnxruntime-directml
uv pip install -r requirements.txt
Note:
- Directml has stopped development for a long time.
- Directml does not support multi-threading tasks very well, so when running extraction it often gets locked to 1 thread.
- Directml partially supports fp16 but its use is not recommended as performance may be equivalent to fp32.
- Directml lacks a function to clean up memory; I've created a simple function for memory cleanup but it may not be very effective.
- Directml is designed for inference, not for training, although it can run training tasks, it is not recommended.
Using with Google Colab
- Open Google Colab: TeraStudio-RVC
- Step 1: Run the Installation cell and wait for it to complete.
- Step 2: Run the Open Usage Interface cell (At this point, the interface will print two URLs: one is 0.0.0.0.7680 and one is a clickable gradio link; click the clickable link and it will take you to the interface).
Run the run_app file to open the usage interface, run the tensorboard file to open the training monitoring chart. (Note: do not close the Command Prompt or Terminal)
run_app.bat / tensorboard.bat
Launch the usage interface. (Add --allow_all_disk to the command to allow gradio to access files outside the project)
env\\Scripts\\python.exe main\\app\\app.py --open
For using Tensorboard to monitor training
env\\Scripts\\python.exe main/app/run_tensorboard.py
Using via command syntax
python main\\app\\parser.py --help
- Currently, new encoders like MRF HIFIGAN are still lacking complete pre-trained models.
- MRF HIFIGAN and REFINEGAN encoders do not support training when pitch is not being trained.
- Power training may improve model quality, but there are no pre-trained models specifically for this feature yet.
- Models in the Vietnamese-RVC repository are collected from various sources across AI Hub, HuggingFace, and other repositories. They may carry different licenses and copyrights.
-
The TeraStudio-RVC project is developed for research, learning, and personal entertainment purposes. I do not encourage nor bear responsibility for any misuse of voice conversion technology for fraudulent purposes, identity impersonation, or violation of the privacy or copyrights of any individual or organization.
-
Users must take full responsibility for their use of this software and commit to complying with the laws in effect in their country of residence or operation.
-
The use of voices of celebrities, real people, or public figures must have permission or ensure no violation of law, ethics, and the rights of involved parties.
-
The project author bears no legal responsibility for any consequences arising from the use of this software.
-
You must ensure that the audio content you upload and convert through this project does not infringe on the intellectual property rights of any third party.
-
You are not permitted to use this project for any illegal activities, including but not limited to use for fraud, harassment, or causing harm to others.
-
You are solely responsible for any damages resulting from improper use of the product.
-
I will not be responsible for any direct or indirect damages arising from the use of this project.
| Work | Author | License |
|---|---|---|
| Applio | IAHispano | MIT License |
| Python-audio-separator | Nomad Karaoke | MIT License |
| Retrieval-based-Voice-Conversion-WebUI | RVC Project | MIT License |
| RVC-ONNX-INFER-BY-Anh | Phạm Huỳnh Anh | MIT License |
| Torch-Onnx-Crepe-By-Anh | Phạm Huỳnh Anh | MIT License |
| Hubert-No-Fairseq | Phạm Huỳnh Anh | MIT License |
| Local-atten | Phil Wang | MIT License |
| TorchFcpe | CN_ChiTu | MIT License |
| FcpeONNX | Yury deiteris | MIT License |
| ContentVec | Kaizhi Qian | MIT License |
| Mediafiredl | Santiago Ariel Mansilla | MIT License |
| Noisereduce | Tim Sainburg | MIT License |
| World.py-By-Anh | Phạm Huỳnh Anh | MIT License |
| Mega.py | Marco Trevisan | No License |
| Gdown | Kentaro Wada | MIT License |
| Whisper | OpenAI | MIT License |
| PyannoteAudio | pyannote | MIT License |
| AudioEditingCode | Hila Manor | MIT License |
| StftPitchShift | Jürgen Hock | MIT License |
| Penn | Interactive Audio Lab | MIT License |
| Voice Changer | Yury deiteris | MIT License |
| Pesto | Sony CSL Paris | LGPL 3.0 |
- If the system error reporting is not working, you can report issues to us vua ISSUE