Skip to content

Commit d54f633

Browse files
committed
Begin 1.2.2 notes
1 parent fe8ce5c commit d54f633

File tree

3 files changed

+102
-0
lines changed

3 files changed

+102
-0
lines changed

_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ parts:
7878
- file: classes/cs483/2-probability-univariate-models
7979
- file: classes/cs483/3-probability-multivariate-models
8080
- file: classes/cs483/4-statistics
81+
- file: classes/cs483/9-linear-discriminant-analysis
8182
- file: classes/cs483/13-neural-networks-tabular
8283
- file: classes/cs483/16-exemplar-based-methods
8384
- file: classes/cs491/overview

classes/cs483/1-intro.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,36 @@ This can be minimized to compute the **maximum likelihood estimate (MLE)**.
7272

7373
### 1.2.2 - Regression
7474

75+
If we want to predict a real-valued quantity $y \in \mathbb{R}$ instead of a class label $y \in \{ 1, \dots, C \}$, this is known as **regression**.
76+
77+
Regression is very similar to classification, but we need to use a different loss function. The most common choice is to use quadratic loss:
78+
79+
$$ \ell_2(y, \hat{y}) = (y - \hat{y})^2 $$
80+
81+
The empirical risk when using quadratic loss is equal to the **mean squared error (MSE)**:
82+
83+
$$ \text{MSE}(\boldsymbol{\theta}) = \frac{1}{N} \sum^N_{n=1} (y_n - f(\boldsymbol{x}_n; \boldsymbol{\theta}))^2 $$
84+
85+
In regression problems, we typically assume that the output distribution is normal.
86+
87+
#### Linear Regression
88+
89+
A **simple linear regression (SLR)** model model takes the following form:
90+
91+
$$ f(x; \boldsymbol{\theta}) = \beta_0 + \beta_1 x $$
92+
93+
We can adjust $\beta_0$ and $\beta_1$ to find the values that minimize the squared errors.
94+
95+
If we have multiple input features, we can use a **multiple linear regression (MLR)** model:
96+
97+
$$ f(x; \boldsymbol{\theta}) = \beta_0 + \beta_1 x_1 + \dots + \beta_n x_n $$
98+
99+
#### Polynomial Regression
100+
101+
102+
103+
#### Deep Neural Networks
104+
75105

76106

77107
### 1.2.3 - Overfitting and generalization
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# 9 - Linear Discriminant Analysis
2+
3+
## 9.1 - Introduction
4+
5+
In this chapter, we consider models of the following form:
6+
7+
$$ p(y = c | \boldsymbol{x}, \boldsymbol{\theta}) = \frac{p(\boldsymbol{x} | y = c, \boldsymbol{\theta})p(y = c | \boldsymbol{\theta})}{\sum_{c'} p(\boldsymbol{x} | y = c', \boldsymbol{\theta}) p(y = c' | \boldsymbol{\theta})} $$
8+
9+
The term $p(y = c | \boldsymbol{\theta})$ is the prior over class labels, and the term $p(\boldsymbol{x} | y = c, \boldsymbol{\theta})$ is called the **class conditional density** for class $c$.
10+
11+
## 9.2 - Gaussian discriminant analysis
12+
13+
### 9.2.1 - Quadratic decision boundaries
14+
15+
16+
17+
### 9.2.2 - Linear decision boundaries
18+
19+
20+
21+
### 9.2.3 - The connection between LDA and logistic regression
22+
23+
24+
25+
### 9.2.4 - Model fitting
26+
27+
28+
29+
### 9.2.5 - Nearest centroid classifier
30+
31+
32+
33+
### 9.2.6 - Fisher’s linear discriminant analysis *
34+
35+
36+
37+
## 9.3 - Naive Bayes classifiers
38+
39+
### 9.3.1 - Example models
40+
41+
42+
43+
### 9.3.2 - Model fitting
44+
45+
46+
47+
### 9.3.3 - Bayesian naive Bayes
48+
49+
50+
51+
### 9.3.4 - The connection between naive Bayes and logistic regression
52+
53+
54+
55+
## 9.4 - Generative vs discriminative classifiers
56+
57+
### 9.4.1 - Advantages of discriminative classifiers
58+
59+
60+
61+
### 9.4.2 - Advantages of generative classifiers
62+
63+
64+
65+
### 9.4.3 - Handling missing features
66+
67+
68+
69+
### 9.5 Exercises -
70+
71+

0 commit comments

Comments
 (0)