The ReLU function is commonly used as an activation function in machine learning, as well, as its modifications (ELU, leaky ReLU).

The overall idea of these functions is the same: before

`x = 0`

the value of the function is small (its limit to infinity is zero or`-1`

), after`x = 0`

the function grows proportionally to x.The exponent function (

`e^x`

or`e^x-1`

) has similar behavior, and its derivative in`x = 0`

is greater than for sigmoid.The visualization below illustrates the exponent in comparison with ReLU and sigmoid activation functions.

So, why the simple function

`y=e^x`

is not used as an activation function in neural networks?

**Answer**

I think the most prominent reason is stability. Think about having consequent layers with exponential activation, and what happens to the output when you input a small number to the NN (e.g. x=1), the forward calculation will look like:

o=exp(exp(exp(exp(1))))≈e3814279

It can go crazy very quickly and I don’t think you can train deep networks with this activation function unless you add other mechanisms like clipping.

**Attribution***Source : Link , Question Author : MefAldemisov , Answer Author : gunes*