Forward and Backward Propagation in Binary Logistic Regression

Original Source: https://www.coursera.org/specializations/deep-learning

We use forward propagation to get our prediction and cost, and use backward propagation(‘backprop’) to get our derivatives for gradient descent.

Notations

$X$: training examples stacked top to bottom ($R^{m\times n}$)

$x^{(i)}, a^{(i)}, z^{(i)}$: row vector or a scalar corresponding to example $i$

$w$: weight vector ($R^{n\times 1}$)

$b$: bias ($R$)

$z$: output of linear transformation ($R^{m\times 1}$)

$a$: prediction ($R^{m\times 1}$)

$J$: cost ($R$)

Forward Propagation

\[z=Xw+b\] \[a=\frac{1}{1+e^{-z}}\] \[J(w,b)=\frac{1}{m} \sum_{i=1}^m(-y^{(i)}log(a^{(i)})-(1-y^{(i)})log(1-a^{(i)}))\]

Backward Propagation

This is our gradient descent.

\[w:=w-\alpha\times \frac{dJ}{dw}\] \[b:=b-\alpha\times \frac{dJ}{db}\]

We calculate $dw, db$ with back propagation.

\[\frac{dJ}{dw} = \frac{da}{dw} \frac{da}{dz} \frac{dJ}{da}\] \[\frac{dJ}{db} = \frac{da}{db} \frac{da}{dz} \frac{dJ}{da}\]

When $m=1$

We will look at backprop of a case where there is only one example and generalize it to cases where $m\geq1$.

When $m=1$, note that we call our cost ‘loss’ \(L(w,b)=-y\log(a)-(1-y)\log(1-a)\)

Backprop First Step

\[\frac{dL}{da} = -\frac{y}{a}+\frac{1-y}{1-a} = \frac{y-a}{a(a-1)}\]

Backprop Second Step

\[\frac{dL}{dz} = \frac{da}{dz} \frac{dL}{da} = -\frac{e^{-z}}{(1+e^{-z})^2} \times \frac{y-a}{a(a-1)} = -a(1-a) \times \frac{y-a}{a(a-1)} = a-y\]

Backprop Third Step

\[\frac{dL}{dw} = \frac{dz}{dw} \frac{dL}{dz} = x(a-y)\]

Note that $x\in R^{1\times n}$ and $a-y$ is a scalar. So the resulting $\frac{dL}{dw} \in R^{1\times n}$.

\[\frac{dL}{db} = \frac{dz}{db} \frac{dL}{dz} = a-y\]

Generalized BackProp

Our cost is a mean of loss of all examples.

\[J(w,b) = \frac{1}{m}\sum_{i=1}^mL(a^{(i)}, y^{(i)})\]

Now that we know $\frac{dL}{dw} = x(a-y)$ and $\frac{dL}{db} = a-y$, we can calculate generalized version of backprop.

\[\frac{dJ}{dw} = \frac{dz}{dw} \frac{dJ}{dz} = \frac{1}{m}\sum_{i=1}^mx^{(i)}(a^{(i)}-y^{(i)}) = \frac{1}{m} X^T(a-y)\] \[\frac{dJ}{db} = \frac{dz}{db} \frac{dJ}{dz} = \frac{1}{m}\sum_{i=1}^m(a^{(i)}-y^{(i)})\]

Leave a Comment