# Why pure exponent is not used as activation function for neural networks?

The ReLU function is commonly used as an activation function in machine learning, as well, as its modifications (ELU, leaky ReLU).

The overall idea of these functions is the same: before x = 0 the value of the function is small (its limit to infinity is zero or -1), after x = 0 the function grows proportionally to x.

The exponent function (e^x or e^x-1) has similar behavior, and its derivative in x = 0 is greater than for sigmoid.

The visualization below illustrates the exponent in comparison with ReLU and sigmoid activation functions.

So, why the simple function y=e^x is not used as an activation function in neural networks?

I think the most prominent reason is stability. Think about having consequent layers with exponential activation, and what happens to the output when you input a small number to the NN (e.g. $$x=1x=1$$), the forward calculation will look like:
$$o=exp(exp(exp(exp(1))))≈e3814279o=\exp(\exp(\exp(\exp(1))))\approx e^{3814279}$$