Forward and Backward Propagation in Neural Networks
Original Source: https://www.coursera.org/specializations/deep-learning
Notations
Input layer is 0th layer, first hidden layer is 1st layer, and so on.
Term inside square brackets is the dimensionality of that notation.
$L$: total number of layers excluding input layer
$m$: number of training examples
$n^{[l]}$: number of units in lth layer
$W^{[l]}$: weight matrix of linear transformation that outputs lth layer [$n^{[l-1]}\times n^{[l]}$]
$b^{[l]}$: bias of linear transformation that outputs lth layer [$1 \times n^{[l]}$]
$Z^{[l]}$: linear transformation output in lth layer [$m\times n^{[l]}$]
$A^{[l]}$: unit matrix of lth layer [$m\times n^{[0]}$]
- $A^{[0]}$ equals $X$ which is an input matrix. When $i>0$, $A^{[l]}$ is an activation of $Z^{[l]}$.
- $A^{[L]}$ equals $\hat{Y}$ which is our prediction.
$g^{[l]}$: activation function of lth layer
$J$: cost
$dZ, dW$ are abbreviations of $\frac{dJ}{dZ}, \frac{dJ}{dW}$ respectively.
Forward Propagation
For $l=1,2,…,L$
\[Z^{[l]} = A^{[l-1]}W^{[l]}+b^{[l]}\] \[A^{[l]} = g^{[l]}(Z^{[l]})\]Remember that $A^{[0]}$ is input matrix $X$ and $A^{[L]}$ is our prediction
Backward Propagation
\[dZ^{[L]} = \frac{1}{m}(A^{[L]}-Y)\] \[dW^{[L]} = {A^{[L-1]}}^TdZ^{[L]}\] \[db^{[L]} = \sum_{i=1}^m dZ^{[L](i)}\]For $l=L-1,L-2…,1$
\[dA^{[l]} = dZ^{[l+1]}{W^{[l+1]}}^T\] \[dZ^{[l]} = dA^{[l]}*g{[l]}'(Z^{[l]})\] \[dW^{[l]} = {A^{[l-1]}}^TdZ^{[l]}\] \[db^{[l]} = \sum_{i=1}^m dZ^{[l](i)}\]
Leave a Comment