Forward and Backward Propagation in Neural Networks

Original Source: https://www.coursera.org/specializations/deep-learning

Notations

Input layer is 0th layer, first hidden layer is 1st layer, and so on.

Term inside square brackets is the dimensionality of that notation.

$L$: total number of layers excluding input layer

$m$: number of training examples

$n^{[l]}$: number of units in lth layer

$W^{[l]}$: weight matrix of linear transformation that outputs lth layer [$n^{[l-1]}\times n^{[l]}$]

$b^{[l]}$: bias of linear transformation that outputs lth layer [$1 \times n^{[l]}$]

$Z^{[l]}$: linear transformation output in lth layer [$m\times n^{[l]}$]

$A^{[l]}$: unit matrix of lth layer [$m\times n^{[0]}$]

  • $A^{[0]}$ equals $X$ which is an input matrix. When $i>0$, $A^{[l]}$ is an activation of $Z^{[l]}$.
  • $A^{[L]}$ equals $\hat{Y}$ which is our prediction.

$g^{[l]}$: activation function of lth layer

$J$: cost

$dZ, dW$ are abbreviations of $\frac{dJ}{dZ}, \frac{dJ}{dW}$ respectively.

Forward Propagation

For $l=1,2,…,L$

\[Z^{[l]} = A^{[l-1]}W^{[l]}+b^{[l]}\] \[A^{[l]} = g^{[l]}(Z^{[l]})\]

Remember that $A^{[0]}$ is input matrix $X$ and $A^{[L]}$ is our prediction

Backward Propagation

\[dZ^{[L]} = \frac{1}{m}(A^{[L]}-Y)\] \[dW^{[L]} = {A^{[L-1]}}^TdZ^{[L]}\] \[db^{[L]} = \sum_{i=1}^m dZ^{[L](i)}\]

For $l=L-1,L-2…,1$

\[dA^{[l]} = dZ^{[l+1]}{W^{[l+1]}}^T\] \[dZ^{[l]} = dA^{[l]}*g{[l]}'(Z^{[l]})\] \[dW^{[l]} = {A^{[l-1]}}^TdZ^{[l]}\] \[db^{[l]} = \sum_{i=1}^m dZ^{[l](i)}\]

Leave a Comment