STT = Speech-to-Text SER = Speech Emotion Recognition TTS = Text-to-Speech
Instructions are for Ubuntu-based systems.
Given that $PROJECT_ROOT is the path to this project's root directory.
-
Follow instructions in the Requirements section in this page: https://github.com/Uberi/speech_recognition?tab=readme-ov-file#requirements
-
Follow instructions here to download SER (Speech Emotion Recognition) model weights, into the directory
$PROJECT_ROOT/model_weight. -
Under the project root directory, copy the content of file
.env.exampleinto a new file named.env, then set the environment variables appropriately.To setup Google auth environment please also see the Google Speech-to-Text API guide (google it!)
-
Run
cd $PROJECT_ROOT virtualenv venv source ./venv/bin/activate pip install -r requirements.txt
cd $PROJECT_ROOT
source ./venv/bin/activate
fastapi run main.py --host 0.0.0.0 --port 7123