Automatic Speech Recognition (ASR) with PyTorch

About • Installation • How To Do Train • How To Evaluate • Credits • License

Installation

Follow these steps to install the project:

(Optional) Create and activate new environment using conda.

# create env
conda create -n project_env python=3.10

# activate env
conda activate project_env

# support youtokentome ://
conda install -c conda-forge gcc=12.1.0
pip install Cython

Install all required packages
```
pip install -r requirements.txt
```
Install pre-commit:
```
pre-commit install
```

If you have issues with installing youtokentome library please write me or youtokentome authors.

How To Do Train

You should have single A100-80gb GPU to exactly reproduce training, otherwise please implement and use gradient accumulation

To train a model, run the following commands and register in WandB:

Two-steps training:

python3 train.py -cn=deepspeech_config.yaml writer.run_name=full_deepspeech_beam_search_torch_zero_to_hero

Stop this training at 23 epoch (due to coverage) and resume it with second config

python3 train.py -cn=deepspeech_config_part2.yaml writer.run_name=full_deepspeech_beam_search_torch_zero_to_hero_part_2_more_augs_wo_limits trainer.resume_from=<ABSOLUTE_PATH_TO_DIRECTORY>/saved/full_deepspeech_beam_search_torch_zero_to_hero/model_best.pth

Stop second training at 45 epoches (in total with first step). Two-steps will take about 24 hours to train

Pay attention that in all configs base model is DeepSpeech2-repack-by-teasgen, but you may use Conformer additionally (take a look at conformer_config.yaml)

Also you may use BPE tokenizer instead of dummy chars, to do it firstly train BPE on full LibriSpeech dataset:

python3 src/utils/train_bpe.py --ls-indices-dir <ABSOLUTE_PATH_TO_DIRECTORY>/data/datasets/librispeech --dir-to-save-model <ABSOLUTE_PATH_TO_DIRECTORY>/data/bpe

And then run training as same as char tokenizer. For more hydra details take a look at deepspeech_config_bpe.yaml. Unfortunately in this mode you mustn't use beam search with LM. You may download my trained BPE from https://disk.yandex.ru/d/6KNUINjFGn9ofQ

Moreover, the training report and other logs are available in WandB https://api.wandb.ai/links/teasgen/dx35cnsu

How To Evaluate

All generated texts will be saved into data/saved/evals directory with corresponing names. Download pretrained model from https://disk.yandex.ru/d/6KNUINjFGn9ofQ and put it in saved/full_deepspeech_beam_search_torch_zero_to_hero_part_2_more_augs_wo_limits/model_best.pth

To run inference provide custom dataset (possibly without transcriptions) as same as data/test_data format and run Optionally you can use LibriSpeech dataset format (set datasets=libri_speech_eval in hydra_config)
```
python3 inference.py -cn=inference.yaml dataloader.batch_size=5
```
Set dataloader.batch_size not more than len(dataset)

Optionally set text_encoder.decoder_type to preferred evalution algorithm. Possible values are:
- argmax
- beam_search (my slow implementation)
- beam_search_torch (fast batched algorithm)
- beam_search_lm (slow single-sample beam search with open source kenlm)
To speed up or raise score you may change beam_size value in hydra config. For getting reported values I used beam_size=50 When you get transcriptions, run following command to calculate WER and CER metrics
```
export PYTHONPATH=./ && python3 src/utils/calculate_cer_wer.py --predicts-dir data/saved/evals/test --gt-dir data/test_data/transcriptions
```
If you want to calculate metrics on dataset provide a it as same as data/test_data format and run
```
python3 inference.py -cn=inference_and_metrics.yaml dataloader.batch_size=5
```
Optionally you can use LibriSpeech dataset format (set datasets=libri_speech_eval in hydra_config)

Credits

This repository is based on a PyTorch Project Template.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data/test_data		data/test_data
src		src
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

Installation

How To Do Train

How To Evaluate

Credits

License

About

Uh oh!

Releases

Packages

Languages

License

teasgen/asr

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

Installation

How To Do Train

How To Evaluate

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages