Skip to content

teasgen/asr

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

AboutInstallationHow To Do TrainHow To EvaluateCreditsLicense

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using conda.

    # create env
    conda create -n project_env python=3.10
    
    # activate env
    conda activate project_env
    
    # support youtokentome ://
    conda install -c conda-forge gcc=12.1.0
    pip install Cython
  2. Install all required packages

    pip install -r requirements.txt
  3. Install pre-commit:

    pre-commit install

If you have issues with installing youtokentome library please write me or youtokentome authors.

How To Do Train

You should have single A100-80gb GPU to exactly reproduce training, otherwise please implement and use gradient accumulation

To train a model, run the following commands and register in WandB:

Two-steps training:

python3 train.py -cn=deepspeech_config.yaml writer.run_name=full_deepspeech_beam_search_torch_zero_to_hero 

Stop this training at 23 epoch (due to coverage) and resume it with second config

python3 train.py -cn=deepspeech_config_part2.yaml writer.run_name=full_deepspeech_beam_search_torch_zero_to_hero_part_2_more_augs_wo_limits trainer.resume_from=<ABSOLUTE_PATH_TO_DIRECTORY>/saved/full_deepspeech_beam_search_torch_zero_to_hero/model_best.pth

Stop second training at 45 epoches (in total with first step). Two-steps will take about 24 hours to train

Pay attention that in all configs base model is DeepSpeech2-repack-by-teasgen, but you may use Conformer additionally (take a look at conformer_config.yaml)

Also you may use BPE tokenizer instead of dummy chars, to do it firstly train BPE on full LibriSpeech dataset:

python3 src/utils/train_bpe.py --ls-indices-dir <ABSOLUTE_PATH_TO_DIRECTORY>/data/datasets/librispeech --dir-to-save-model <ABSOLUTE_PATH_TO_DIRECTORY>/data/bpe

And then run training as same as char tokenizer. For more hydra details take a look at deepspeech_config_bpe.yaml. Unfortunately in this mode you mustn't use beam search with LM. You may download my trained BPE from https://disk.yandex.ru/d/6KNUINjFGn9ofQ

Moreover, the training report and other logs are available in WandB https://api.wandb.ai/links/teasgen/dx35cnsu

How To Evaluate

All generated texts will be saved into data/saved/evals directory with corresponing names. Download pretrained model from https://disk.yandex.ru/d/6KNUINjFGn9ofQ and put it in saved/full_deepspeech_beam_search_torch_zero_to_hero_part_2_more_augs_wo_limits/model_best.pth

  1. To run inference provide custom dataset (possibly without transcriptions) as same as data/test_data format and run Optionally you can use LibriSpeech dataset format (set datasets=libri_speech_eval in hydra_config)

    python3 inference.py -cn=inference.yaml dataloader.batch_size=5

    Set dataloader.batch_size not more than len(dataset)

    Optionally set text_encoder.decoder_type to preferred evalution algorithm. Possible values are:

    • argmax
    • beam_search (my slow implementation)
    • beam_search_torch (fast batched algorithm)
    • beam_search_lm (slow single-sample beam search with open source kenlm)

    To speed up or raise score you may change beam_size value in hydra config. For getting reported values I used beam_size=50 When you get transcriptions, run following command to calculate WER and CER metrics

    export PYTHONPATH=./ && python3 src/utils/calculate_cer_wer.py --predicts-dir data/saved/evals/test --gt-dir data/test_data/transcriptions
  2. If you want to calculate metrics on dataset provide a it as same as data/test_data format and run

    python3 inference.py -cn=inference_and_metrics.yaml dataloader.batch_size=5

    Optionally you can use LibriSpeech dataset format (set datasets=libri_speech_eval in hydra_config)

Credits

This repository is based on a PyTorch Project Template.

License

License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages