This project is part of the MATH UN2015 Linear Algebra & Probability course for the Fall 2024 semester at Columbia University.
The goal of this project is to develop a predictive model that classifies speaker attributes, specifically accent, gender, and age, based on audio data. The model leverages feature extraction techniques to capture relevant characteristics of the speaker's voice that contribute to these classifications.
Key features such as pitch, words per second, and tempo will be extracted from the audio files using linear algebra-based algorithms. These features will then be used to train a Logistic Regression model for classification.
The project will train a Logistic Regression model using the extracted features to predict the target attributes: gender, age, and accent.
- Audio file classification based on accent, gender, and age.
- Visualizations of intermediate steps, such as feature distribution and PCA results, to help interpret the model's performance.
-
Clone the repository:
git clone <repository_url> -
Create the anaconda environment with:
conda env create -f environment.yml -
Download the dataset from Kaggle and replace the common-voice directory
Run the 'un2015.ipynb'
Check result from report 'final_report.pdf'