From 58f181c3a13b1521888ae3ebbee4e54ae3dc14b9 Mon Sep 17 00:00:00 2001 From: Srihari Thyagarajan Date: Thu, 3 Apr 2025 13:54:00 +0530 Subject: [PATCH] Fix quotes Signed-off-by: Srihari Thyagarajan --- en/part5/log_regression/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/en/part5/log_regression/index.html b/en/part5/log_regression/index.html index a12798fd..b76ea1ab 100644 --- a/en/part5/log_regression/index.html +++ b/en/part5/log_regression/index.html @@ -353,7 +353,7 @@ \end{align*} Using these equations for probability of $Y|X$ we can create an algorithm that selects values of theta that maximize that probability for all data. I am first going to state the log probability function and partial derivatives with respect to theta. Then later we will (a) show an algorithm that can chose optimal values of theta and (b) show how the equations were derived. -

An important thing to realize is that: given the best values for the parameters ($\theta$), logistic regression often can do a great job of estimating the probability of different class labels. However, given bad , or even random, values of $\theta$ it does a poor job. The amount of ``intelligence" that you logistic regression machine learning algorithm has is dependent on having good values of $\theta$. +

An important thing to realize is that: given the best values for the parameters ($\theta$), logistic regression often can do a great job of estimating the probability of different class labels. However, given bad , or even random, values of $\theta$ it does a poor job. The amount of "intelligence" that you logistic regression machine learning algorithm has is dependent on having good values of $\theta$.

Notation

Before we get started I want to make sure that we are all on the same page with respect to notation. In logistic regression, $\theta$ is a vector of parameters of length $m$ and we are going to learn the values of those parameters based off of $n$ training examples. The number of parameters should be equal to the number of features of each datapoint.