Why are natural splines almost always cubic?

By natural spline, I mean a regression spline that’s linear at the boundary (i.e. the regions where X is smaller than the smallest knot or larger than the largest knot).

I know that for smoothing splines, the function that minimizes the relevant objective function is a shrunken version of a natural cubic spline with knots at each of the observations.

But for a natural spline estimated by least squares (or perhaps lasso/ridge regression) with a smaller number of knots… should the spline necessarily be cubic? Or (if the goal is prediction of a target variable in a machine learning context), should the degree be chosen by cross-validation instead of always using cubic?


This is probably anticlimactic… I think this is a bit of conversion if we want to consider the resulting fit to be smooth. It stems as well as feeds on the fact that by smooth function one commonly refers to “twice differentiable“. To quote Faraway’s Linear Models with R: “The basis function is continuous and is also continuous in its first and second derivatives at each knotpoint. This property ensures the smoothness of the fit.“.

To start with an example: Such a convention immediately takes care of Taylor’s theorem such that if $g$ is a smooth function there exist a $\psi \in (0,x)$ such that $g(x) = g(0) + xg'(0) + \frac{x^2}{2}g”(\psi)$. Higher order differentials definitely do matter at times but the usual convention is to check the first two and proceed.

Additionally, following the rationale from Ramsay & Silverman’s seminal book on Functional Data Analysis, the second derivative $g”(x)$ of a function at $x$ is often called its curvature at $x$ and the squared integral of it (i.e. the integrated squared second derivative: $\int [g”(x)]^2dx$) can be seen as a natural measure of a function smoothness (or roughness depend how we look at this). This working assumption of “smooth enough because second derivative exists” is almost universal when working with curves/functional data (e.g. Horváth & Kokoszka’s Inference for Functional Data with Applications and Ferraty & Vieu’s Nonparametric Functional Data Analysis put similar convention in place); once again this working assumption and not a hard requirement. It also goes without saying that if we work with $g”(x)$ as our unit of analysis we assume that $g””(x)$ exist and so forth. As a side-note: the existence of a second derivative is associated with the isotropy of a function (e.g. see Switzer (1976) Geometrical measures of the smoothness of random functions) which is a reasonable assumption for data assumed to lie on a continuum (e.g. have spatial dependence).

Let me note that there is no reason why a higher or lower order requirement for the continuity of derivatives cannot be used. For example, we might choose to use a piecewise linear interpolant in cases where we have an insufficient amount of data. Finally, the degree of smoothness is indeed chosen using a cross-validation approach (usually Generalised Cross-Validation to be more exact) based on the metric we choose (see for example the popular function mgcv::gam does exactly that when fitting the smooth splines, Yao et al. (2005) Functional linear regression analysis for longitudinal data does the same when picking the bandwidth of the kernel smoothers, etc.)

One might find the following Math.SE thread on: Is second derivative of a function related to curve smoothness? also insightful, unfortunately it does not contain a definite answer.

So, “why are the natural splines almost always cubic?” Because assuming the existence of second derivatives and thus the need for a cubic fit, is a good convention for most cases. ☺

Source : Link , Question Author : Mathew Carroll , Answer Author : usεr11852

Leave a Comment