Initially begun as a small project to learn some typescript and work out implementing a few statistical analyses, it actually also functions as a nice log of the various derivations and theorems I have been reading and doing in a notebook.
Thus, I have decided to document this a little more thoroughly and use it as proof of concept for my excursions into numerical computing. While thin at the moment (and not the most robust in approaches) the idea is that this repository would continue to grow as I hand-implement more algorithms and also make existing ones more robust (for example other optimization approaches, non-closed-form LR, etc).
The original typescript univariate implementation still exists inside of the original src directory but, they have been sidelined in favor of a Python backend and java/typescript front end for visualization and data loading.
Outlined below are the different algorithms I have implemented so far. Mostly this is gonna be regressions and GLM's, bread and butter, and work out from there. More than likely add in other things like small unsupervised clustering algorithms like K-means or PCA and perhaps some supervised methods like decision trees, before moving on to more complex methods.
The code itself for the current algos lives inside of
Learning/Learning/And the previous implementations of univariate regression as well as the front end that still need to be worked out live inside of
Learning/src/There are examples at the bottom of each python module inside of the "main" block that can be run to test out each implementation if anyone perusing this file is curious to see. There are, of-course unlisted dependencies (numpy/scipy/scikit-learn) but this is not really meant to be an entirely public use at the moment so for now I leave it up to the user to pip/conda install their way to success.
I have implemented linear regression as a multiple regression which also functions as a simple (univariate) regression when 1 predictor is provided. In linear regression we model a dependent variable (our target/response variable) as a function of a (or multiple) predictor variable(s).
(1):
or
(2):
This implementation uses the linear algebra form for ease of implementation, where X is our design matrix. A design matrix is a matrix where each row is one of our observations i and each column is one of our features k. Importantly the first column of the design matrix is all 1's so B0 our intercept is constant across observations (also the reason it's different from a feature matrix).
B is our coefficient vector which contains coefficients for each one of our predictors. Here the optimal predictors can be determined with the closed form solution
(3):
Binary Logistic regression models the probability of an event happening as a linear combination of its predictors. Here the "event" is the probability of the response/target variable having the label it does, i.e. also the reason why we use this method for classification.
The equation for a logistic regression is as follows:
(4):
Here y-hat is our predicted probability P( y=1 | x). Ideally we would want to maximize the probability of our particular outcomes occurring (and therefore have the most accurate model). Fortunately, since we are working with binary outcomes, we can assume a bernoulli distribution and therefore, model our loss function as a maximum likelihood estimation where L(B):
However, in practice, multiplying probabilities like that leads to very small numbers and numerical underflow so we typically take the negative log likelihood (also called "cross entropy").
(6):
or
Because of the sigmoid function we do not automatically receive an easy to use closed-form solution so we must use numerical methods to compute our ideal solution.
Fortunately the negative log-likelihood is convex in B and therefore has a global minimum, so I implemented gradient descent (steepest descent) with T iterations, where our gradient is defined as:
and our update rule is:
Currently this was tested on the mpg dataset found in scipy and the iris dataset inside of sklearn. These are pretty standard classroom datasets I used throughout undergrad so I thought it fitting to use them as the basis for testing functionality before a couple real-use cases.
A Poisson regression is typically used to model count data and assumes that the data follows a poisson distribution. In a poisson process we are counting the number of events per unit of time or space and the number of events depends on only the length or size of the interval.
Thus, assuming our data is ~poisson we can take advantage of the PMF for a Poisson distribution:
where our main parameter
Using the PMF we model our maximum likelihood estimation as follows:
Ofcourse we take the negative log likelihood to minimize numerical underflow from multiplying many small probabilities as well as model this as a minimization problem:
In a regression context we would say that
We have 3 main assumptions for a Poisson regression
- The response variable is a count per unit of time or space described by a poisson disribution
- The observations must be independent of one another
- By definition the mean of a poisson random variable must be equal to its variance, where:
-
$E(Y)=\lambda$ and$SD(Y)=\sqrt\lambda$
-
So why not just use a linear model? Well our data come in the form of counts and therefore, in theory, have a minimum possible value of 0 and no upperbound. The problem with this is that if we tried to model our main parameter (the average rate) as a function of one or more covariates:
The response variables would violate the
So what can we do? Well one way to avoid theese issues is to simply model
(or in the linear algebra notation ive been using so far)
This opens to possibiltiy for us to map values from negative infinity to infinity and takes into account the increasing variance by allowing the mean to increase smoothly and stay positive (notice the exp form makes this more explicit for us).
Fortunately as with logistic regression our negative log likelihood is convex in b meaning there is one global minimum (unique for standard conditions).
This allows us to once again implement gradient descent as our optimization algorithm of choice. First we must calculate our gradient (vector of first derivatives) which tells us the direction of steepest ascent/descent $$
\nabla_B\ell(\beta)=X^T(\exp{(X\beta)}-y)
$$
Then we can define our update rule as
and get underway with implementation.
I plan on also adding in some kind of typescript front end GUI/chart displayer mostly to try to get some practice using javascript/typescript.
There is actually already a univariate regression implemented in typescript before I realized that there weren’t many good vectorized math packages (aside from like tensorflow but this came with its own suite of problems) in the nodejs version of typescript and it's really not meant for that anyways but, it gave me a solid foundation thus far.
Perhaps I will also add in some mysql stuff to pull in datasets although it's hard to really implement SQL without using a proper database connection via stuff like microsoft azure.
I plan on continuing implementing mostly GLMs for the purpose of growing my clinical research relevant skillset at the moment but, will also try out things like K-means and SVD-PCA as well as other generalizations.
In the near future:
- k-means
- PCA
- negative bionomial regression
Incase you'd like to know where I'm pulling all this from, or if you'd like to read along.
-
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. http://www.deeplearningbook.org/
-
Roback, P., & Legler, J. (2021). Beyond multiple linear regression: Applied generalized linear models and multilevel models in R. CRC Press. Retrieved from https://bookdown.org/roback/bookdown-BeyondMLR/
-
Wikipedia contributors. (n.d.). Poisson regression. In Wikipedia, The Free Encyclopedia. Retrieved January 21, 2026, from https://en.wikipedia.org/wiki/Poisson_regression?utm_source=chatgpt.com