Begin 1.2.2 notes

MoenMi · MoenMi · commit d54f6332578e · 2025-06-02T00:27:28.000-05:00
diff --git a/_toc.yml b/_toc.yml
@@ -78,6 +78,7 @@ parts:
           - file: classes/cs483/2-probability-univariate-models
           - file: classes/cs483/3-probability-multivariate-models
           - file: classes/cs483/4-statistics
+          - file: classes/cs483/9-linear-discriminant-analysis
           - file: classes/cs483/13-neural-networks-tabular
           - file: classes/cs483/16-exemplar-based-methods
       - file: classes/cs491/overview
diff --git a/classes/cs483/1-intro.md b/classes/cs483/1-intro.md
@@ -72,6 +72,36 @@ This can be minimized to compute the **maximum likelihood estimate (MLE)**.
 
 ### 1.2.2 - Regression
 
+If we want to predict a real-valued quantity $y \in \mathbb{R}$ instead of a class label $y \in \{ 1, \dots, C \}$, this is known as **regression**.
+
+Regression is very similar to classification, but we need to use a different loss function. The most common choice is to use quadratic loss:
+
+$$ \ell_2(y, \hat{y}) = (y - \hat{y})^2 $$
+
+The empirical risk when using quadratic loss is equal to the **mean squared error (MSE)**:
+
+$$ \text{MSE}(\boldsymbol{\theta}) = \frac{1}{N} \sum^N_{n=1} (y_n - f(\boldsymbol{x}_n; \boldsymbol{\theta}))^2 $$
+
+In regression problems, we typically assume that the output distribution is normal.
+
+#### Linear Regression
+
+A **simple linear regression (SLR)** model model takes the following form:
+
+$$ f(x; \boldsymbol{\theta}) = \beta_0 + \beta_1 x $$
+
+We can adjust $\beta_0$ and $\beta_1$ to find the values that minimize the squared errors.
+
+If we have multiple input features, we can use a **multiple linear regression (MLR)** model:
+
+$$ f(x; \boldsymbol{\theta}) = \beta_0 + \beta_1 x_1 + \dots + \beta_n x_n $$
+
+#### Polynomial Regression
+
+
+
+#### Deep Neural Networks
+
 
 
 ### 1.2.3 - Overfitting and generalization
diff --git a/classes/cs483/9-linear-discriminant-analysis.md b/classes/cs483/9-linear-discriminant-analysis.md
@@ -0,0 +1,71 @@
+# 9 - Linear Discriminant Analysis
+
+## 9.1 - Introduction
+
+In this chapter, we consider models of the following form:
+
+$$ p(y = c | \boldsymbol{x}, \boldsymbol{\theta}) = \frac{p(\boldsymbol{x} | y = c, \boldsymbol{\theta})p(y = c | \boldsymbol{\theta})}{\sum_{c'} p(\boldsymbol{x} | y = c', \boldsymbol{\theta}) p(y = c' | \boldsymbol{\theta})} $$
+
+The term $p(y = c | \boldsymbol{\theta})$ is the prior over class labels, and the term $p(\boldsymbol{x} | y = c, \boldsymbol{\theta})$ is called the **class conditional density** for class $c$.
+
+## 9.2 - Gaussian discriminant analysis
+
+### 9.2.1 - Quadratic decision boundaries
+
+
+
+### 9.2.2 - Linear decision boundaries
+
+
+
+### 9.2.3 - The connection between LDA and logistic regression
+
+
+
+### 9.2.4 - Model fitting
+
+
+
+### 9.2.5 - Nearest centroid classifier
+
+
+
+### 9.2.6 - Fisher’s linear discriminant analysis *
+
+
+
+## 9.3 - Naive Bayes classifiers
+
+### 9.3.1 - Example models
+
+
+
+### 9.3.2 - Model fitting
+
+
+
+### 9.3.3 - Bayesian naive Bayes
+
+
+
+### 9.3.4 - The connection between naive Bayes and logistic regression
+
+
+
+## 9.4 - Generative vs discriminative classifiers
+
+### 9.4.1 - Advantages of discriminative classifiers
+
+
+
+### 9.4.2 - Advantages of generative classifiers
+
+
+
+### 9.4.3 - Handling missing features
+
+
+
+### 9.5 Exercises -
+
+