Voice Activity Detection (VAD) Model from Scratch

A deep learning-based Voice Activity Detection (VAD) system implemented from scratch using Bidirectional LSTM (BiLSTM) networks. This project demonstrates the application of modern AI techniques in speech processing, focusing on detecting speech and non-speech regions in audio signals, particularly in noisy environments.

🚀 Features

Deep Learning Architecture: Utilizes BiLSTM for temporal modeling of speech patterns
Noise Robustness: Trained with noise augmentation to handle low SNR conditions
Real-time Capable: Optimized for frame-level predictions suitable for streaming audio
Open Source: Built with PyTorch and publicly available datasets
Educational: Comprehensive implementation with detailed explanations

📋 Prerequisites

Python 3.8 or higher
uv package manager
Jupyter Notebook or JupyterLab

🛠️ Installation

Clone the repository:

git clone https://github.com/hasithdd/VAD-Model-from-Scratch.git
cd VAD-Model-from-Scratch

Install dependencies with uv:
```
uv sync
```
This will create a virtual environment and install all required packages as specified in pyproject.toml.
Activate the environment (optional, uv handles this automatically):
```
uv run python --version
```

📊 Dataset

This project uses the Google Speech Commands Dataset v0.01, a public dataset containing:

65,000 one-second audio clips of 30 different words
Background noise samples
Mono WAV files at 16 kHz sample rate

The dataset is automatically downloaded and prepared during notebook execution.

🏗️ Model Architecture

The VAD system employs:

Feature Extraction: Mel-Frequency Cepstral Coefficients (MFCCs)
Network: Bidirectional LSTM with 2 layers and 128 hidden units
Output: Frame-level binary classification (speech/non-speech)
Sequence Length: 8-second overlapping windows

🚀 Usage

Start Jupyter Notebook:
```
uv run jupyter notebook
```
Open the main notebook: Navigate to Cw1_w1987535_HasithVikasithaDharmarathna.ipynb and open it.
Run the cells sequentially:
- The notebook includes data preparation, model training, and evaluation
- Training takes approximately 10 epochs on a GPU-enabled system
- Evaluation provides confusion matrix and performance metrics
Key sections:
- Data loading and preprocessing
- Feature extraction (MFCC)
- Model training with BiLSTM
- Validation and performance analysis

📈 Results

The model achieves:

Accuracy: 86.3%
Precision: 86.1%
Recall: 82.7%
F1 Score: 84.4%

Evaluated on noisy validation data with -10 dB SNR.

🔧 Development

Adding Dependencies

To add new packages:

uv add package-name

Running Tests

uv run python -m pytest

Exporting Model

For deployment, the trained model can be exported to ONNX:

uv run python export_model.py  # (if you create this script)

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 References

🙏 Acknowledgments

Based on the coursework for 6COSC020W Applied AI
Inspired by modern VAD systems like Silero VAD and pyannote.audio

Author: P.A. Hasith Vikasitha Dharmarathna
ID: 20223265
GitHub: hasithdd

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Charts		Charts
.gitignore		.gitignore
.python-version		.python-version
Cw1_w1987535_HasithVikasithaDharmarathna.ipynb		Cw1_w1987535_HasithVikasithaDharmarathna.ipynb
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Activity Detection (VAD) Model from Scratch

🚀 Features

📋 Prerequisites

🛠️ Installation

📊 Dataset

🏗️ Model Architecture

🚀 Usage

📈 Results

🔧 Development

Adding Dependencies

Running Tests

Exporting Model

🤝 Contributing

📄 License

📚 References

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

hasithdd/VAD-Model-from-Scratch

Folders and files

Latest commit

History

Repository files navigation

Voice Activity Detection (VAD) Model from Scratch

🚀 Features

📋 Prerequisites

🛠️ Installation

📊 Dataset

🏗️ Model Architecture

🚀 Usage

📈 Results

🔧 Development

Adding Dependencies

Running Tests

Exporting Model

🤝 Contributing

📄 License

📚 References

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages