This repository includes the code, report and slides for the Python project I completed in Spring of 2017. The topic was dimensionality reduction through PCA (Principal Component Analysis) and NMF (Non-negative Matrix Factorization).
I chose to implement PCA from the ground up without resorting to any existing scientific libraries (that is to say no Numpy, Scipy, etc...). The actual goal is to find the eigenvectors associated with the greatest eigenvalues of a positive semi-definite matrix. The resulting code boils down to Householder reduction and QR iteration (which is obtained through Givens rotations). The implementation is 100% Vanilla Python, hence much slower than the actual Fortran routine upon which Scipy is based. It is still robust and will yield good results in large dimensions (provided you're willing to wait long enough...)
I didn't have time to implement NMF by myself and just used what is available in sklearn.decomposition. I
reproduced the results of Lee & Seung regarding Grolier encyclopedia articles in their foundational 1999 paper.
report.pdfcontains theoretical explanations behind the algorithms as well as practical resultsslides.pdfcontains the oral presentation (final grade: 15/20)calcmat.pycontains matrix calculus-related functionspower.pycontains a naive approach to PCA with power iterationQR.pycontains my actual PCA implementationimage.pyapplies PCA to compression of Lena's faceNMF.pyapplies NMF to words in encyclopedia articles
The code and the report are written in French, but an English translation should follow shortly.