Multi-task Learning

Original Source: https://www.coursera.org/specializations/deep-learning

Say we want to find out cars, pedestrians, road signs, and traffic lights in an image. There are two ways to do this with machine learning.

First, we can train 4 neural networks with binary classification.

Second, we can make one model predict multiple classes. For example, our model can predict that there are car and pedestrian in an image. This is called transfer learning.

When do we use multi-task learning?

  1. When training on a set of tasks that could benefit from having shared lower-level features
  2. When amount of data you have for each task is quite similar
  3. When you can train a big enough neural network to do well on all tasks

Architecture

Our prediction $\hat{Y}$ and label $Y$ is a matrix of shape $m\times k$ where $k$ is number of classes.

If $m=3$ and $k=4$(car, pedestrian, road sign, traffic light) and \(Y=\left[ {\begin{array}{} 1 & 1 & 0 & 1\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 1\\ \end{array} } \right]\) then there are car, pedestrian and traffic light in $X^{(1)}$, pedestrian in $X^{(2)}$, road sign and traffic light in $X^{(3)}$.

Cost

\[J = -\frac{1}{m}\sum_{i=1}^m \sum_{j=1}^k (y_j^{(i)}\log(\hat{y}_j^{(i)})+(1-y_j^{(i)})\log(1-\hat{y}_j^{(i)}))\]

The cost can be computed even if some entries are not labled.

Leave a Comment