By natural spline, I mean a regression spline that’s linear at the boundary (i.e. the regions where X is smaller than the smallest knot or larger than the largest knot).

I know that for smoothing splines, the function that minimizes the relevant objective function is a shrunken version of a natural cubic spline with knots at each of the observations.

But for a natural spline estimated by least squares (or perhaps lasso/ridge regression) with a smaller number of knots… should the spline

necessarilybe cubic? Or (if the goal is prediction of a target variable in a machine learning context), should the degree be chosen by cross-validation instead of always using cubic?

**Answer**

This is probably anticlimactic… I think this is a bit of conversion if we want to consider the resulting fit to be smooth. It stems as well as feeds on the fact that by *smooth* function one commonly refers to “*twice differentiable*“. To quote Faraway’s *Linear Models with R*: “*The basis function is continuous and is also continuous in its first and second derivatives at each knotpoint. This property ensures the smoothness of the fit.*“.

To start with an example: Such a convention immediately takes care of Taylor’s theorem such that if $g$ is a smooth function there exist a $\psi \in (0,x)$ such that $g(x) = g(0) + xg'(0) + \frac{x^2}{2}g”(\psi)$. Higher order differentials definitely do matter at times but the usual convention is to check the first two and proceed.

Additionally, following the rationale from Ramsay & Silverman’s seminal book on *Functional Data Analysis*, the second derivative $g”(x)$ of a function at $x$ is often called its curvature at $x$ and the squared integral of it (i.e. the integrated squared second derivative: $\int [g”(x)]^2dx$) can be seen as a natural measure of a function smoothness (or roughness depend how we look at this). This working assumption of “smooth enough because second derivative exists” is almost universal when working with curves/functional data (e.g. Horváth & Kokoszka’s *Inference for Functional Data with Applications* and Ferraty & Vieu’s *Nonparametric Functional Data Analysis* put similar convention in place); once again this *working assumption* and *not* a hard requirement. It also goes without saying that if we work with $g”(x)$ as our unit of analysis we assume that $g””(x)$ exist and so forth. As a side-note: the existence of a second derivative is associated with the isotropy of a function (e.g. see Switzer (1976) *Geometrical measures of the smoothness of random functions*) which is a reasonable assumption for data assumed to lie on a continuum (e.g. have spatial dependence).

Let me note that there is no reason why a higher or lower order requirement for the continuity of derivatives cannot be used. For example, we might choose to use a piecewise linear interpolant in cases where we have an insufficient amount of data. Finally, the degree of smoothness is indeed chosen using a cross-validation approach (usually Generalised Cross-Validation to be more exact) based on the metric we choose (see for example the popular function `mgcv::gam`

does exactly that when fitting the smooth splines, Yao et al. (2005) *Functional linear regression analysis for longitudinal data* does the same when picking the bandwidth of the kernel smoothers, etc.)

One might find the following Math.SE thread on: Is second derivative of a function related to curve smoothness? also insightful, unfortunately it does not contain a definite answer.

So, “*why are the natural splines almost always cubic?*” Because assuming the existence of second derivatives and thus the need for a cubic fit, is a good convention for most cases. ☺

**Attribution***Source : Link , Question Author : Mathew Carroll , Answer Author : usεr11852*