Activation Functions

Original Source: https://www.coursera.org/specializations/deep-learning

Why Use Non-Linear Activation Function?

The reason we used hidden layers was to represent complex, non-linear model. Non-linear activation function adds this complexity to the model.

If we don’t use activation or use linear activation function, our final model can be represented with a single linear regression or logistic regression.

Pros and Cons of Activation Function

activation functions

Tanh function is always faster than sigmoid function.
With sigmoid and tanh, when absolute value of z becomes big, slope of function converges to 0, so gradient descent becomes slower.
Most times, we use ReLU which is faster than tanh
Try using Leaky ReLU
Sigmoid is commonly used as an activation function of output layer in binary classification

Share on

Twitter Facebook Google+ LinkedIn

YoonSoo

Activation Functions

Why Use Non-Linear Activation Function?

Pros and Cons of Activation Function

Share on

Leave a Comment

You May Also Enjoy

Generalized Linear Models (GLM)

“ALBERT: A Lite BERT for Self-supervised Learning of Language Representations” Summarized

“Generative Pretraining from Pixels” Summarized

“Language Models are Few-Shot Learners” Summarized