Skip to content

This project aims to apply knowledge distillation to develop a lightweight and efficient on-device ASR model.

Notifications You must be signed in to change notification settings

santoshdahal2016/OnDevice-ASR

Repository files navigation

Knowledge Distillation for Speech Recognition

This repository implements knowledge distillation for Automatic Speech Recognition (ASR) models using NVIDIA NeMo framework. The project focuses on training smaller, more efficient FastConformer-Transducer models by distilling knowledge from larger teacher models.

🎯 Project Overview

Knowledge distillation is a technique for transferring knowledge from a large, complex model (teacher) to a smaller, more efficient model (student). This project implements knowledge distillation for speech recognition models, enabling the creation of compact models that maintain competitive performance while requiring fewer computational resources.

Key Features

  • FastConformer-Transducer Architecture: State-of-the-art streaming ASR model
  • Knowledge Distillation: Transfer learning from teacher to student models
  • ONNX Export: Model deployment with ONNX runtime for efficient inference
  • LibriSpeech Training: Comprehensive training on LibriSpeech dataset
  • Multiple Model Sizes: Support for different model architectures (Small, Medium, Large)

πŸ“ Repository Structure

knowledgedistill/
β”œβ”€β”€ README.md                              # This file
β”œβ”€β”€ train.py                              # Main training script
β”œβ”€β”€ base.yaml                             # Base configuration file
β”œβ”€β”€ fast-conformer_transducer_bpe.yaml    # Large model configuration
β”œβ”€β”€ fast-conformer_transducer_bpe_medium.yaml # Medium model configuration (with KD)
β”œβ”€β”€ manifest/                             # Dataset manifest files
β”‚   β”œβ”€β”€ train_manifest.json
β”‚   β”œβ”€β”€ val_manifest.json
β”‚   β”œβ”€β”€ test_clean_manifest.json
β”‚   └── test_other_manifest.json
β”œβ”€β”€ scripts/                              # Utility scripts
β”‚   β”œβ”€β”€ generate_manifest.py             # Generate LibriSpeech manifests
β”‚   β”œβ”€β”€ evaluate.py                      # Model evaluation
β”‚   β”œβ”€β”€ export_onnx.py                   # Export models to ONNX
β”‚   β”œβ”€β”€ inference_onnx.py                # ONNX inference
β”‚   └── extract_tokenizer.py             # Tokenizer utilities
β”œβ”€β”€ tokenizer/                           # BPE tokenizer files
β”‚   β”œβ”€β”€ tokenizer.model
β”‚   β”œβ”€β”€ tokenizer.vocab
β”‚   └── vocab.txt
β”œβ”€β”€ teacher/                             # Teacher model storage
β”‚   └── teacher_model.nemo
β”œβ”€β”€ models/                              # Trained models
β”‚   └── full_train.nemo
β”œβ”€β”€ experiments/                         # Training experiment logs
β”‚   β”œβ”€β”€ base/                           # Base model experiments
β”‚   β”œβ”€β”€ new/                            # New model experiments
β”‚   β”œβ”€β”€ whole_train/                    # Full training experiments
β”‚   └── wandb/                          # Weights & Biases logs
└── Nemo-CM/                            # NeMo framework (submodule)

πŸš€ Getting Started

Prerequisites

  • Python 3.8+
  • NVIDIA GPU with CUDA support
  • NeMo framework
  • PyTorch
  • LibriSpeech dataset

Installation

  1. Clone the repository:
git clone <repository-url>
cd knowledgedistill
  1. Install dependencies:
pip install nemo_toolkit
pip install -r requirements.txt  # if available
  1. Download LibriSpeech dataset and update paths in configuration files.

Dataset Preparation

  1. Generate manifest files for LibriSpeech:
python scripts/generate_manifest.py

This script processes LibriSpeech directories and creates manifest files for:

  • Training data (train-clean-100, train-clean-360, train-other-500)
  • Validation data (dev-clean, dev-other)
  • Test data (test-clean, test-other)

πŸ‹οΈ Training

Standard Training

Train a FastConformer model without knowledge distillation:

python train.py --config-path=. --config-name=fast-conformer_transducer_bpe

Knowledge Distillation Training

Train a student model with knowledge distillation:

python train.py --config-path=. --config-name=fast-conformer_transducer_bpe_medium \
    model.enable_kd=True \
    model.teacher_model_path=/path/to/teacher_model.nemo \
    model.kd_temperature=4.0 \
    model.kd_alpha=0.7

Configuration Parameters

Knowledge Distillation Parameters

  • enable_kd: Enable/disable knowledge distillation (default: False)
  • teacher_model_path: Path to the pre-trained teacher model
  • kd_temperature: Temperature for softening probability distributions (default: 1.0)
  • kd_alpha: Weight balancing between distillation loss and ground truth loss (default: 0.5)

Model Architecture Variants

Model Size d_model n_heads n_layers Parameters Config File
Small 176 4 16 ~14M Custom
Medium 256 4 16 ~32M medium.yaml
Large 512 8 17 ~120M base.yaml

πŸ“Š Evaluation

Evaluate trained models:

python scripts/evaluate.py

This script:

  • Loads the trained model
  • Transcribes test audio files
  • Computes Word Error Rate (WER) and Character Error Rate (CER)
  • Saves transcriptions for analysis

πŸ”§ Model Export and Deployment

ONNX Export

Export trained models to ONNX format for efficient inference:

python scripts/export_onnx.py

This creates:

  • encoder.onnx: Encoder model
  • decoder.onnx: Decoder model
  • joiner.onnx: Joint network
  • preprocessor.ts: TorchScript preprocessor
  • tokens.txt: Vocabulary file

ONNX Inference

Run inference with exported ONNX models:

python scripts/inference_onnx.py --audio_file /path/to/audio.wav

πŸ“ˆ Experiment Tracking

The project uses Weights & Biases (wandb) for experiment tracking:

  • Training metrics (loss, WER, learning rate)
  • Model configurations
  • Hyperparameter sweeps
  • Experiment comparison

Configure wandb in the experiment manager section of config files:

exp_manager:
  create_wandb_logger: true
  wandb_logger_kwargs:
    name: experiment_name
    project: project_name

πŸŽ›οΈ Configuration Files

Base Configuration (base.yaml)

  • Large FastConformer model (512 d_model, 17 layers)
  • Standard training without knowledge distillation
  • Optimized for high accuracy

Medium Configuration (fast-conformer_transducer_bpe_medium.yaml)

  • Medium FastConformer model (256 d_model, 16 layers)
  • Knowledge distillation support
  • Balanced between efficiency and accuracy

Key differences:

  • Smaller model architecture (32M vs 120M parameters)
  • Knowledge distillation parameters
  • Adjusted learning rates and batch sizes

πŸ“ Training Tips

  1. Batch Size: Adjust based on GPU memory:

    • 16GB GPU: batch_size=8-16
    • 32GB GPU: batch_size=16-32
    • 80GB GPU: batch_size=32-64
  2. Knowledge Distillation:

    • Use kd_temperature=3-5 for better knowledge transfer
    • Balance kd_alpha between 0.3-0.7 depending on teacher quality
    • Ensure teacher and student use the same tokenizer
  3. Training Duration:

    • Medium models: 50-100 epochs
    • Large models: 100-500 epochs
    • Monitor validation WER for early stopping

πŸ” Results and Performance

The knowledge distillation approach typically achieves:

  • Model Size Reduction: 70-80% parameter reduction (120M β†’ 32M)
  • Performance Retention: 90-95% of teacher model accuracy
  • Inference Speed: 2-3x faster inference
  • Memory Usage: 60-70% reduction in GPU memory

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

  • NVIDIA NeMo team for the excellent ASR framework
  • LibriSpeech corpus for training data
  • FastConformer architecture contributors

πŸ“š References


For questions or issues, please open an issue in the repository or contact the maintainers.

About

This project aims to apply knowledge distillation to develop a lightweight and efficient on-device ASR model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published