Skip to content

Mini Project Report on AI-generated voice cloning using GANs, leveraging the Mozilla Common Voice dataset to train and evaluate generative models for realistic speech synthesis.

Notifications You must be signed in to change notification settings

sagar-0208/voice-cloning-gan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

AI-Generated Voice Cloning Using GANs

This repository contains the Mini Project Report (MPR) notebook for "AI-Generated Voice Cloning Using GANs", focused on developing a system that synthesizes human-like voices using Generative Adversarial Networks (GANs). The project utilizes the Mozilla Common Voice dataset to train and evaluate the generative model on diverse speech samples.

🎯 Project Objective

To build an AI system capable of cloning human voices by training a GAN architecture on real-world multilingual voice data. The aim is to replicate the natural tone, pitch, and speaking style of a given speaker through synthesized speech samples.

🧠 Key Components

  • Generative Adversarial Network (GAN) Architecture:

    • Custom generator and discriminator models tailored for raw audio signal generation.
    • Feature extraction techniques applied before feeding audio into GAN.
  • Voice Preprocessing & Feature Engineering:

    • Audio normalization, silence trimming, and spectrogram generation.
    • Conversion to Mel spectrograms for stable GAN training.
  • Training Loop:

    • Balanced generator-discriminator training cycle.
    • Loss functions customized for speech signal characteristics.
  • Voice Cloning Evaluation:

    • Comparison of real vs. synthesized audio using waveform visualization and audio output.
    • Metric evaluation (e.g., Spectral Convergence, Signal-to-Noise Ratio).

πŸ“¦ Dataset

This project uses the Mozilla Common Voice Dataset:

  • Open-source, multilingual dataset of speech samples.
  • Provides thousands of validated clips across multiple speakers, languages, and accents.
  • Used for both training and evaluation phases.

πŸ—‚οΈ File Structure

voice-cloning-gan/
β”œβ”€β”€ AAI MPR.ipynb      # Main project notebook
└── README.md          # Project documentation

βš™οΈ Setup Instructions

  1. Clone the repository:

    git clone https://github.com/yourusername/voice-cloning-gan.git
    cd voice-cloning-gan
  2. Install dependencies:

    pip install -r requirements.txt
  3. Download dataset:

    • Visit Common Voice and download your preferred language version.
    • Extract and place the audio files inside a data/ directory (if using externally).
  4. Run the notebook:

    jupyter notebook "AAI MPR.ipynb"

πŸ§ͺ Results Preview

  • Synthesized voice samples generated after each training epoch.
  • Visual comparison between input voice spectrograms and generated outputs.
  • Evaluation through waveform plots and perceptual listening.

πŸ“Œ Status

βœ… Core GAN architecture implemented
🚧 Currently testing across multiple speakers and languages
πŸ“Š Future improvements include attention layers and multi-speaker conditioning

About

Mini Project Report on AI-generated voice cloning using GANs, leveraging the Mozilla Common Voice dataset to train and evaluate generative models for realistic speech synthesis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published