Regularized Linear Regression and Logistic Regression Cheatsheet
Original Source: https://www.coursera.org/learn/machine-learning
Notations
I will represent scalar as $x$, vector as $\vec{x}$, and matrix as $X$.
Basic
$m$: the number of training examples
$n$: the number of features
$k$: the number of output classes
$\alpha$: learning rate
$\lambda$: regularization term
$X$: an input matrix, where each row represents each training example, and each column represents each feature. Note that first column $X_1$ is a vector composed only of 1s.
$X^{(i)}$: the row vector of all the feature inputs of the ith training example
$X_j$: the column vector of all the jth feature of training examples.
$X_j^{(i)}$: value of feature j in the ith training example.
For Linear Regression and Binary Classification
$\vec{y}$: an output column vector, where each row represents each training example.
$\vec{\theta}$: column vector of weights
For Binary Classification
$\vec{z}$: an output vector of regression step, also an input vector of sigmoid.
For Multiclass Classification
$Y$: an output matrix, where each row represents each training example, and each column represents each class.
$Z$: an output matrix of regression step, also an input matrix of softmax
$\Theta$: n by k matrix of weights
Regularized Linear Regression
Hypothesis
\[h_\vec{\theta}(X)=X\vec{\theta}\]Cost
\[J(\vec{\theta}) = \frac{1}{2m}\Big((h_\vec{\theta}(X)-\vec{y})^T(h_\vec{\theta}(X)-\vec{y})+\lambda\vec{\theta}^T\vec{\theta}\Big)\]Gradient Descent
\[\vec{\theta} := \vec{\theta} - \frac{\alpha}{m}\Big(X^T (h_\vec{\theta}(X)-\vec{y})+\lambda\vec{\theta}\Big)\]Regularized Logistic Regression
Binary Classification
Hypothesis
\[h_{\vec{\theta}}(X) = g(X\vec{\theta})\] \[g(\vec{z}) = \frac{1}{1 + e^{-\vec{z}}}\]Cost
\[J(\vec{\theta})=-\frac{1}{m}\Big(\vec{y}^T\log\big(h_\vec{\theta}(X)\big)+(1-\vec{y})^T\log\big(1-h_\vec{\theta}(X)\big)+\frac{\lambda}{2}\vec{\theta}^T\vec{\theta}\Big)\]Gradient Descent
\[\vec{\theta} := \vec{\theta} - \frac{\alpha}{m}\Big(X^T (h_\vec{\theta}(X)-\vec{y})+\lambda\theta\Big)\]Multiclass Classification
Hypothesis
\[h_\Theta(X) = g(X\Theta)\] \[g(Z^{(i)}) = \frac{e^{Z^{(i)}}}{\sum e^{Z^{(i)}}}\]where $\sum \vec{a}$ is the sum of all elements of vector a
Leave a Comment