I have a general question. Recently I just learnt Basis Expansion and Regularization. There are several interesting techniques including: cubic spline, natural spline, b-spline and smoothing spline.
The question is, what is the Pros and Cons(if there is any) of smoothing spline compared to the “typical” cubic and natural spline where users have to select the knots ?
Well, generally it is stupid to just ask people which method is better without the context of the real problems. Thus I am just asking, based on your experiences, which one is better?
One of the Pros I can see is: smoothing spline technique avoid selecting the knots.
The terminology of splines can be confusing (at least I find it so) as exactly what people mean when they use “cubic spline”, for example, depends on the type of cubic spline; we can have, for example, both cubic smoothing splines and cubic (penalised) regression splines.
What I sketch below is taken from sections 5.1.2 and 5.2 of Wood (2017).
An interpolating spline g(xi) say would set g(xi)=yi as it interpolates the observations yi via a function composed of sections of cubic polynomials joined such that the spline is continuous to the second derivative.
A cubic smoothing spline aims to balance fit to the data with producing a smooth function; the aim is not to interpolate the data which arises in interpolating splines. Rather than set g(xi)=yi, a cubic smoothing spline acts as n free parameters to be estimated so as to minimise (Wood, 2017)
where the first part is a measure of the fit to the data, whilst the second part is a penalty against wiggliness (it the integral sums up the squared second derivative of the spline as a measure of the curvature or wiggliness, how fast the curve is changing slope). We can think of wiggliness as complexity so the function includes a penalty against overly complex smooths.
It can be shown that a cubic smoothing spline g(x), of all possible functions f, is the functions that minimises the above criterion (a proof is given in Wood, 2017, section 5.1.2 pp. 198).
As with an interpolating spline, a cubic smoothing spline has knots located at each observation pair xi, yi. Earlier I mentioned that a smoothing spline has n free parameters; there are as many parameters as data. Yet the effect of λ, the penalty against over wiggly smooths, is to produce a spline that is much smoother than implied if it used n degrees of freedom (Wood 2017).
This is the major negative on the side of smoothing splines. You have to estimate as many parameters as you have data and yet the effect of many of those parameters will in general be low because of the penalty against overly complex (wiggly) fits.
Balancing this is the fact that the choice of knots in the smoothing spline is taken care of, because there is no choice.
Moving to the penalized regression spline setting, we now have the choice of where to place the knots but we get to choose how many knots to use. How might we decide if this is a useful trade-off, that it is beneficial to fit the spline with a reduced number of knots even if we have to decide how many and where to put them?
In a penalised regression spline, rather than think of knots per se, think of the spline as being made up of basis functions; these are little functions, which each have a coefficient, whose linear combination gives the value of the spline for a given xi. The choice now is how many basis functions to use to model the response with the number k being much fewer than the number of data n. The theory underlying this choice is a little limited or restricted to special cases or approaches to estimating the value for λ but the general idea is that the number of basis functions required grows only slowly with n in order to achieve close to the optimal performance represented by smoothing splines (summarised from Wood 2017).
In general, where the knots are actually distributed through the data for a cubic regression spline does not have much of an effect on the fitted spline. Typical choices are to place k−1 knots evenly over the interval of x, or to place knots at the quantiles of the distribution of x. If you have a very uneven spread of observations over range of x, it would be wasteful to place knots evenly over x so you could concentrate them where you have data. Alternatively, transforming x in some way may even out the distribution such that placing knots evenly is possible again.
When fitting a spline model in high dimensions, say a spline of two variables, knot placement is more problematic if the pairs of x1i,x2i are limited to some region of the space spanned by x1 and x2; if the data do not originate in large parts of the space, then placing knots evenly will result in many of the knots being located away from the support of the data. Which is wasteful. Strategies for dealing with are available, such as space-filling algorithms, or using P-splines and sparse derivative-based penalties that allow for efficient estimation even in unevenly-distributed data (e.g. Wood 2016)
Wood, S. N. 2016. P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data. Stat. Comput. 1–5. doi:10.1007/s11222-016-9666-x (Open Access)
Wood, S. N. 2017. Generalized Additive Models: An Introduction with R, Second Edition, CRC Press.