I am reading Chapter 11 of Elements of Statistical learning and came across this sentence:

“Unlike methods like CART and MARS, neural networks are smooth

functions of real-valued parameters”What is meant by ‘smooth functions’ here? I have come across things such as smoothing splines, but am unsure what a ‘smooth function’ means more generally.

Following on from the above, what makes neural networks specifically smooth functions?

**Answer**

A smooth function has continuous derivatives, up to some specified order. At the very least, this implies that the function is continuously differentiable (i.e. the first derivative exists everywhere and is continuous). More specifically, a function is Ck smooth if the 1st through kth order derivatives exist everywhere, and are continuous.

Neural nets can be written as compositions of elementary functions (typically affine transformations and nonlinear activation functions, but there are other possibilities). For example, in feedforward networks, each layer implements a function whose output is passed as input to the next layer. Historically, neural nets have *tended* to be smooth, because the elementary functions used to construct them were themselves smooth. In particular, nonlinear activation functions were typically chosen to be smooth sigmoidal functions like tanh or the logistic sigmoid function.

However, the quote is *not* generally true. Modern neural nets often use piecewise linear activation functions like the rectified linear (ReLU) activation function and its variants. Although this function is continuous, it’s not smooth because the derivative doesn’t exist at zero. Therefore, neural nets using these activation functions are not smooth either.

In fact, the quote isn’t generally true, even historically. The McCulloch-Pitts model was the first artificial neural net. It was composed of thresholded linear units, which output binary values. This is equivalent to using a step function as the activation function. This function isn’t even continuous, let alone smooth.

**Attribution***Source : Link , Question Author : Sean , Answer Author : user20160*