GitHub - NikolaiHerrmann/audio-classifier: ML pipeline for speaker identification

Goal

This research is aimed to classify multivariate time-series to perform speaker identification. We specifically investigate speech recordings from which linear prediction cepstral coefficients (LPCCs) coefficients have been extracted. Three classifiers are being examined for this task: a simple Convolutional Neural Network (CNN) using 1D convolution, a Random Forest classifier and a Support Vector Machine. The latter two use hand-crafted features (i.e. mean, standard deviation and slope of each time-series). These classifiers were trained and tested on two different datasets, namely the Japanese Vowels dataset and the (English) Free Spoken Digit dataset. In this manner, the classifiers' performances are evaluated in two different scenarios, as the datasets vary in language, length and task. We find that the hand-crafted classifiers outperformed the neural network classifier.

Run instructions

Install the Python requirements (Python 3.10.9).
Run the train.py file:

python train.py

Japanese Vowel (/ae/) Data Set

Size

640 speaker recordings
9 unique speakers
Split:
- Train: 270, 30 recordings per speaker
- Test: 370, 24-88 recordings per speaker

Parameters

LPCCs order of 12

Source

https://archive.ics.uci.edu/ml/datasets/Japanese+Vowels

Spoken Digit Data Set

extract all zip files in spoken_digits
one recording per txt file

Size

3000 speaker recordings
6 unique speakers
50 recordings of each digit per speaker

Parameters

wav files with sample rate of 8kHz (pretty low)
recordings are trimmed, almost no silence at start/end points

Feature Extraction

LPCC (Linear Predictive Cepstral Coefficients):
- Using matlab_speech_features functions to extract LPCCs
- Source: https://github.com/jameslyons/matlab_speech_features
Parameters:
- Window size of 0.030 (30 ms)
- Window step size of 0.015 (15 ms)
- 12 cepstral coefficients (order)
Additional MATLAB dependencies required for audio importing and analysis:

Source

https://github.com/Jakobovski/free-spoken-digit-dataset

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
japanese_vowels		japanese_vowels
spoken_digits		spoken_digits
.gitignore		.gitignore
README.md		README.md
cnn.py		cnn.py
data_description.ipynb		data_description.ipynb
datasets.py		datasets.py
handcrafted.py		handcrafted.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goal

Run instructions

Japanese Vowel (/ae/) Data Set

Spoken Digit Data Set

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

NikolaiHerrmann/audio-classifier

Folders and files

Latest commit

History

Repository files navigation

Goal

Run instructions

Japanese Vowel (/ae/) Data Set

Spoken Digit Data Set

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages