Speaker Diarization is a system that identifies and separates speakers in an audio file, with their respective timestamps and speaker labels. This model is integrated with a Automatic Speech Recognition (ASR) model to transcribe each speaker's audio. This model efficiently inputs a folder with audio files and presents a csv file containing speaker separation with their corresponding time stamps and audio transcription.
This process aids child rescue efforts by distinguishing victim and abuser voices, providing crucial evidence for court proceedings and in distinguishing speakers from background noise during criminal investigations
-
Clone the Repository:
git clone https://github.com/UMass-Rescue/Audio-Diarization.git cd Audio-Diarization -
Install Dependencies:
For the best results create a virtaul environment. You can use any method to create a virtual environment!
One of the ways to create a virtual environment is listed below
python -m venv <virtual_env_name>
Activate the virtual environment
source <virtual_env_name>/bin/activate
cd <virtual_env_name>\Scripts .\activate
Install the required Python packages using the following command:
pip install -r requirements.txt
Make sure to install ffmpeg on your system if you don't already have it
If you already have homebrew you can use the command listed below to directly install ffmpeg. If not you can follow the documentation to install homebrew and then use the command listed below.
brew install ffmpeg
Use this link to install the ffmpeg executable. Click on the windows icon and use the windows build from gyan.dev
Follow the installation instructions mentioned in the installer Add ffmpeg to the environment variables to make to accessible globally
-
**Access the model **
This step is not longer needed! Unless you want to run the model API directly. By default this application is set to run the local model pipeline for pyannote/speaker-diarization-3.0
huggingface-cli login
You will be prompted to enter the access token which you can find: https://huggingface.co/settings/tokens
-
Running the Flask-ML Server
Start the Flask-ML server to work with RescueBox for Audio Diarization:
python model_3endpoints.py
The server will start running on 127.0.0.1 5000
-
Download and run RescueBox Desktop from the following link: Rescue Box Desktop
Open the RescueBox Desktop application and register the model
Speaker Diarization is a system that identifies and separates speakers in an audio file, with their respective timestamps and speaker labels. This model is integrated with a Automatic Speech Recognition (ASR) model to transcribe each speaker's audio. This model efficiently inputs a folder with audio files and presents a csv file containing speaker separation with their corresponding time stamps and audio transcription.
This process aids child rescue efforts by distinguishing victim and abuser voices, providing crucial evidence for court proceedings and in distinguishing speakers from background noise during criminal investigations
-
Clone the Repository:
git clone https://github.com/UMass-Rescue/Audio-Diarization.git cd Audio-Diarization -
Install Dependencies:
For the best results create a virtaul environment. You can use any method to create a virtual environment!
One of the ways to create a virtual environment is listed below
python -m venv <virtual_env_name>
Activate the virtual environment
source <virtual_env_name>/bin/activate
cd <virtual_env_name>\Scripts .\activate
Install the required Python packages using the following command:
pip install -r requirements.txt
Make sure to install ffmpeg on your system if you don't already have it
If you already have homebrew you can use the command listed below to directly install ffmpeg. If not you can follow the documentation to install homebrew and then use the command listed below.
brew install ffmpeg
Use this link to install the ffmpeg executable. Click on the windows icon and use the windows build from gyan.dev
Follow the installation instructions mentioned in the installer Add ffmpeg to the environment variables to make to accessible globally
-
**Access the model **
This step is not longer needed! Unless you want to run the model API directly. By default this application is set to run the local model pipeline for pyannote/speaker-diarization-3.0
huggingface-cli login
You will be prompted to enter the access token which you can find: https://huggingface.co/settings/tokens
-
Running the Flask-ML Server
Start the Flask-ML server to work with RescueBox for Audio Diarization:
python model_3endpoints.py
The server will start running on 127.0.0.1 5000
-
Download and run RescueBox Desktop from the following link: Rescue Box Desktop
Open the RescueBox Desktop application and register the model
On the left hand side you can see three option - Speaker Diarization, Audio Transcription and Speaker Diarization + Audio Transcription. You can select one of the end points based on your preference Speaker Diarization runs only the speaker seperation with their timestamps, Audio Transcription does only the audio transcription. Speaker Diarization + Audio Transcription does the primary task of seperating speakers in an audio file with their time stamp and their transcribed audio
Set the Input and Output directory. The input directory should have an audio file and an output directory where the csv file with the speaker seperation, time stamps and audio transcription is found. Once this is done you can click on the 'Run Model' at the bottom
Click the 'view' button to see the results. Results will be displayed as follows in the results section