Forward and Backward Propagation in Binary Logistic Regression
Original Source: https://www.coursera.org/specializations/deep-learning
We use forward propagation to get our prediction and cost, and use backward propagation(‘backprop’) to get our derivatives for gradient descent.
Notations
$X$: training examples stacked top to bottom ($R^{m\times n}$)
$x^{(i)}, a^{(i)}, z^{(i)}$: row vector or a scalar corresponding to example $i$
$w$: weight vector ($R^{n\times 1}$)
$b$: bias ($R$)
$z$: output of linear transformation ($R^{m\times 1}$)
$a$: prediction ($R^{m\times 1}$)
$J$: cost ($R$)
Forward Propagation
\[z=Xw+b\] \[a=\frac{1}{1+e^{-z}}\] \[J(w,b)=\frac{1}{m} \sum_{i=1}^m(-y^{(i)}log(a^{(i)})-(1-y^{(i)})log(1-a^{(i)}))\]Backward Propagation
This is our gradient descent.
\[w:=w-\alpha\times \frac{dJ}{dw}\] \[b:=b-\alpha\times \frac{dJ}{db}\]We calculate $dw, db$ with back propagation.
\[\frac{dJ}{dw} = \frac{da}{dw} \frac{da}{dz} \frac{dJ}{da}\] \[\frac{dJ}{db} = \frac{da}{db} \frac{da}{dz} \frac{dJ}{da}\]When $m=1$
We will look at backprop of a case where there is only one example and generalize it to cases where $m\geq1$.
When $m=1$, note that we call our cost ‘loss’ \(L(w,b)=-y\log(a)-(1-y)\log(1-a)\)
Backprop First Step
\[\frac{dL}{da} = -\frac{y}{a}+\frac{1-y}{1-a} = \frac{y-a}{a(a-1)}\]Backprop Second Step
\[\frac{dL}{dz} = \frac{da}{dz} \frac{dL}{da} = -\frac{e^{-z}}{(1+e^{-z})^2} \times \frac{y-a}{a(a-1)} = -a(1-a) \times \frac{y-a}{a(a-1)} = a-y\]Backprop Third Step
\[\frac{dL}{dw} = \frac{dz}{dw} \frac{dL}{dz} = x(a-y)\]Note that $x\in R^{1\times n}$ and $a-y$ is a scalar. So the resulting $\frac{dL}{dw} \in R^{1\times n}$.
\[\frac{dL}{db} = \frac{dz}{db} \frac{dL}{dz} = a-y\]Generalized BackProp
Our cost is a mean of loss of all examples.
\[J(w,b) = \frac{1}{m}\sum_{i=1}^mL(a^{(i)}, y^{(i)})\]Now that we know $\frac{dL}{dw} = x(a-y)$ and $\frac{dL}{db} = a-y$, we can calculate generalized version of backprop.
\[\frac{dJ}{dw} = \frac{dz}{dw} \frac{dJ}{dz} = \frac{1}{m}\sum_{i=1}^mx^{(i)}(a^{(i)}-y^{(i)}) = \frac{1}{m} X^T(a-y)\] \[\frac{dJ}{db} = \frac{dz}{db} \frac{dJ}{dz} = \frac{1}{m}\sum_{i=1}^m(a^{(i)}-y^{(i)})\]
Leave a Comment