Why pure exponent is not used as activation function for neural networks?

The ReLU function is commonly used as an activation function in machine learning, as well, as its modifications (ELU, leaky ReLU).

The overall idea of these functions is the same: before x = 0 the value of the function is small (its limit to infinity is zero or -1), after x = 0 the function grows proportionally to x.

The exponent function (e^x or e^x-1) has similar behavior, and its derivative in x = 0 is greater than for sigmoid.

The visualization below illustrates the exponent in comparison with ReLU and sigmoid activation functions.

The comparison of the exponent with some popular activations

So, why the simple function y=e^x is not used as an activation function in neural networks?

Answer

I think the most prominent reason is stability. Think about having consequent layers with exponential activation, and what happens to the output when you input a small number to the NN (e.g. x=1), the forward calculation will look like:
o=exp(exp(exp(exp(1))))e3814279

It can go crazy very quickly and I don’t think you can train deep networks with this activation function unless you add other mechanisms like clipping.

Attribution
Source : Link , Question Author : MefAldemisov , Answer Author : gunes

Leave a Comment