Activation Functions

Original Source: https://www.coursera.org/specializations/deep-learning

Why Use Non-Linear Activation Function?

The reason we used hidden layers was to represent complex, non-linear model. Non-linear activation function adds this complexity to the model.

If we don’t use activation or use linear activation function, our final model can be represented with a single linear regression or logistic regression.

Pros and Cons of Activation Function

activation functions

  • Tanh function is always faster than sigmoid function.
  • With sigmoid and tanh, when absolute value of z becomes big, slope of function converges to 0, so gradient descent becomes slower.
  • Most times, we use ReLU which is faster than tanh
  • Try using Leaky ReLU
  • Sigmoid is commonly used as an activation function of output layer in binary classification

Leave a Comment