# How can I fit a spline to data that contains values and 1st/2nd derivatives?

I have a dataset that contains, let’s say, some measurements for position, speed and acceleration. All come from the same “run”. I could construct a linear system and fit a polynomial to all of those measurements.

But can I do the same with splines? What is an ‘R’ way of doing this?

Here is some simulated data I would like to fit:

f <- function(x) 2+x-0.5*x^2+rnorm(length(x), mean=0, sd=0.1)
df <- function(x) 1-x+rnorm(length(x), mean=0, sd=0.3)
ddf <- function(x) -1+rnorm(length(x), mean=0, sd=0.6)

x_f <- runif(5, 0, 5)
x_df <- runif(8, 3, 8)
x_ddf <- runif(10, 4, 9)

data <- data.frame(type=rep('f'), x=x_f, y=f(x_f))
data <- rbind(data, data.frame(type=rep('df'), x=x_df, y=df(x_df)))
data <- rbind(data, data.frame(type=rep('ddf'), x=x_ddf, y=ddf(x_ddf)))

library(ggplot2)
ggplot(data, aes(x, y, color=type)) + geom_point()

library(splines)
m <- lm(data$$y ~ bs(data$$x, degree=6)) # but I want to fit on f, df, ddf. possible?


We will describe how a spline can be used through Kalman Filtering
(KF) techniques in relation with a State-Space Model (SSM). The fact
that some spline models can be represented by SSM and computed with KF
was revealed by C.F. Ansley and R. Kohn in the years 1980-1990. The
estimated function and its derivatives are the expectations of the
state conditional on the observations. These estimates are computed by
using a fixed interval smoothing, a routine task when using a SSM.

For the sake of simplicity, assume that the observations are made at
times $$t1 and that the observation number $$kk$$ at
$$tkt_k$$ involves only one derivative with order $$dkd_k$$ in
$${0,1,2}\{0,\,1,\,2\}$$. The observation part of the model writes as
$$y(tk)=f[dk](tk)+ε(tk) \tag{O1} y(t_k) = f^{[d_k]}(t_k) + \varepsilon(t_k)$$
where $$f(t)f(t)$$ denotes the unobserved true function and $$ε(tk)\varepsilon(t_k)$$
is a Gaussian error with variance $$H(tk)H(t_k)$$ depending on the derivation
order $$dkd_k$$. The (continuous time) transition equation takes the general form
$$\tag{T1} \frac{\text{d}}{\text{d}t}\boldsymbol{\alpha}(t) = \mathbf{A} \boldsymbol{\alpha}(t) + \boldsymbol{\eta}(t) \tag{T1} \frac{\text{d}}{\text{d}t}\boldsymbol{\alpha}(t) = \mathbf{A} \boldsymbol{\alpha}(t) + \boldsymbol{\eta}(t)$$
where $$\boldsymbol{\alpha}(t)\boldsymbol{\alpha}(t)$$ is the unobserved state vector and
$$\boldsymbol{\eta}(t)\boldsymbol{\eta}(t)$$ is a Gaussian white noise with covariance $$\mathbf{Q}\mathbf{Q}$$,
assumed to be independent of the observation noise r.vs $$\varepsilon(t_k)\varepsilon(t_k)$$.
In order to describe a spline, we consider a state obtained by stacking the $$mm$$
first derivatives, i.e. $$\boldsymbol{\alpha}(t) := [f(t),\, f^{[1]}(t), \, \dots,\, f^{[m-1]}(t)]^\top\boldsymbol{\alpha}(t) := [f(t),\, f^{[1]}(t), \, \dots,\, f^{[m-1]}(t)]^\top$$. The transition is
$$\begin{bmatrix} f^{[1]}(t) \\ f^{[2]}(t) \\ \vdots \\ f^{[m-1]}(t) \\ f^{[m]}(t) \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & &\\ 0 & 0 & 1 & & \\ \vdots & & & \ddots &\\ & & & & 1\\ 0 & \dots & & & 0 \end{bmatrix} \begin{bmatrix} f(t) \\ f^{[1]}(t) \\ \vdots \\ f^{[m-2]}(t)\\ f^{[m-1]}(t) \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \\ \vdots\\ 0 \\ \eta(t) \end{bmatrix} \begin{bmatrix} f^{[1]}(t) \\ f^{[2]}(t) \\ \vdots \\ f^{[m-1]}(t) \\ f^{[m]}(t) \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & &\\ 0 & 0 & 1 & & \\ \vdots & & & \ddots &\\ & & & & 1\\ 0 & \dots & & & 0 \end{bmatrix} \begin{bmatrix} f(t) \\ f^{[1]}(t) \\ \vdots \\ f^{[m-2]}(t)\\ f^{[m-1]}(t) \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \\ \vdots\\ 0 \\ \eta(t) \end{bmatrix}$$
and we then get a polynomial spline with order $$2m2m$$ (and degree
$$2m-12m-1$$). While $$m=2m=2$$ corresponds to the usual cubic spline, a higher
order will be required to use derivatives with order $$>1>1$$. In order to
stick to a classical SSM formalism we can rewrite (O1) as
$$\tag{O2} y(t_k) = \mathbf{Z}(t_k) \boldsymbol{\alpha}(t_k) + \varepsilon(t_k), \tag{O2} y(t_k) = \mathbf{Z}(t_k) \boldsymbol{\alpha}(t_k) + \varepsilon(t_k),$$
where the observation matrix $$\mathbf{Z}(t_k)\mathbf{Z}(t_k)$$ picks the suitable
derivative in $$\boldsymbol{\alpha}(t_k)\boldsymbol{\alpha}(t_k)$$ and the variance $$H(t_k)H(t_k)$$ of
$$\varepsilon(t_k)\varepsilon(t_k)$$ is chosen depending on $$d_kd_k$$. So $$\mathbf{Z}(t_k) = \mathbf{Z}^\star_{d_k + 1}\mathbf{Z}(t_k) = \mathbf{Z}^\star_{d_k + 1}$$ where $$\mathbf{Z}^\star_1 := [1,\,0,\,\dots,\,0]\mathbf{Z}^\star_1 := [1,\,0,\,\dots,\,0]$$,
$$\mathbf{Z}^\star_2 := [0,\,1,\,\dots\,0]\mathbf{Z}^\star_2 := [0,\,1,\,\dots\,0]$$ and $$\mathbf{Z}^\star_3 := [0,\,0,\,1, 0,\,\dots]\mathbf{Z}^\star_3 := [0,\,0,\,1, 0,\,\dots]$$.
Similarly $$H(t_k) = H^\star_{d_k+1}H(t_k) = H^\star_{d_k+1}$$ for three variances $$H^\star_1H^\star_1$$,
$$H^\star_2H^\star_2$$, and $$H^\star_3H^\star_3$$.

Although the transition is in continuous time, the KF is actually a
standard discrete time one. Indeed, we will in practice focus on
times $$tt$$ where we have an observation, or where we want to estimate
the derivatives. We can take the set $$\{t_k\}\{t_k\}$$ to be the union of
these two sets of times and assume that the observation at $$t_kt_k$$ can
be missing: this allows to estimate the $$mm$$ derivatives at any time
$$t_kt_k$$ regardless of the existence of an observation. There remains to
derive the discrete SSM.

We will use indices for discrete times, writing $$\boldsymbol{\alpha}_k\boldsymbol{\alpha}_k$$ for
$$\boldsymbol{\alpha}(t_k)\boldsymbol{\alpha}(t_k)$$ and so on. The discrete-time SSM takes the
form
\begin{align*} \tag{DT} \boldsymbol{\alpha}_{k+1} &= \mathbf{T}_k \,\boldsymbol{\alpha}_{k} + \boldsymbol{\eta}^\star_{k}\\ y_k &= \mathbf{Z}_k\boldsymbol{\alpha}_k + \varepsilon_k \end{align*}\begin{align*} \tag{DT} \boldsymbol{\alpha}_{k+1} &= \mathbf{T}_k \,\boldsymbol{\alpha}_{k} + \boldsymbol{\eta}^\star_{k}\\ y_k &= \mathbf{Z}_k\boldsymbol{\alpha}_k + \varepsilon_k \end{align*}
where the matrices $$\mathbf{T}_k\mathbf{T}_k$$ and $$\mathbf{Q}_k^\star := \text{Var}(\boldsymbol{\eta}_k^\star)\mathbf{Q}_k^\star := \text{Var}(\boldsymbol{\eta}_k^\star)$$ are derived from (T1) and (O2)
while the variance of $$\varepsilon_k\varepsilon_k$$ is given by
$$H_k=H^\star_{d_k+1}H_k=H^\star_{d_k+1}$$ provided that $$y_ky_k$$ is not missing.
Using some algebra we can find the transition
matrix for the discrete-time SSM
$$\mathbf{T}_k = \exp\left\{ \delta_k \mathbf{A} \right\} = \begin{bmatrix} 1 & \frac{\delta_k^1}{1!} & \frac{\delta_k^2}{2!} & \dots & \frac{\delta_k^{m-1}}{(m-1)!}\\ 0 & 1 & \frac{\delta_k^1}{1!} & & \\ \vdots & & & \ddots &\\ & & & & \frac{\delta_k^1}{1!}\\ 0 & \dots & & & 1 \end{bmatrix}, \qquad \mathbf{T}_k = \exp\left\{ \delta_k \mathbf{A} \right\} = \begin{bmatrix} 1 & \frac{\delta_k^1}{1!} & \frac{\delta_k^2}{2!} & \dots & \frac{\delta_k^{m-1}}{(m-1)!}\\ 0 & 1 & \frac{\delta_k^1}{1!} & & \\ \vdots & & & \ddots &\\ & & & & \frac{\delta_k^1}{1!}\\ 0 & \dots & & & 1 \end{bmatrix}, \qquad$$
where $$\delta_k:= t_{k+1} - t_{k}\delta_k:= t_{k+1} - t_{k}$$ for $$k. Similarly the
covariance matrix $$\mathbf{Q}^\star_k = \text{Var} (\boldsymbol{\eta}_k^\star)\mathbf{Q}^\star_k = \text{Var} (\boldsymbol{\eta}_k^\star)$$ for the discrete-time SSM can be
given as
$$\mathbf{Q}^\star_k= \sigma_\eta^2 \, \left[\frac{\delta_k^{2m-i-j+1}}{(m-i)!(m-j)! (2m-i-j+1)}\right]_{i,j} \mathbf{Q}^\star_k= \sigma_\eta^2 \, \left[\frac{\delta_k^{2m-i-j+1}}{(m-i)!(m-j)! (2m-i-j+1)}\right]_{i,j}$$
where the indices $$ii$$ and $$jj$$ are between $$11$$ and $$mm$$.

Now to carry over the computation in R we need a package devoted to KF
and accepting time-varying models; the CRAN package KFAS seems a
good option. We can write R functions to compute the matrices
$$\mathbf{T}_k\mathbf{T}_k$$ and $$\mathbf{Q}^\star_k\mathbf{Q}^\star_k$$ from the vector of times $$t_kt_k$$
in order to encode the SSM (DT). In the notations used by the package,
a matrix $$\mathbf{R}_k\mathbf{R}_k$$ comes to multiply the noise
$$\boldsymbol{\eta}^\star_k\boldsymbol{\eta}^\star_k$$ in the transition equation of (DT): we take
it here to be the identity $$\mathbf{I}_m\mathbf{I}_m$$. Also note that a diffuse
initial covariance must be used here.

EDIT The $$\mathbf{Q}^\star\mathbf{Q}^\star$$ as initially written was wrong. Fixed (aslo in R code and image).

C.F. Ansley and R. Kohn (1986) "On the Equivalence of Two Stochastic Approaches to Spline Smoothing" J. Appl. Probab., 23, pp. 391–405

R. Kohn and C.F. Ansley (1987) "A New Algorithm for Spline Smoothing Based
on Smoothing a Stochastic Process" SIAM J. Sci. and Stat. Comput.,
8(1), pp. 33–48

J. Helske (2017). "KFAS: Exponential Family State Space Models in
R" J. Stat. Soft., 78(10), pp. 1-39

smoothWithDer <- function(t, y, d, m = 3,
Hstar = c(3, 0.2, 0.1)^2, sigma2eta = 1.0^2) {

## define the SSM matrices, depending on 'delta_k' or on 'd_k'
Tfun <- function(delta) {
mat <-  matrix(0, nrow = m, ncol = m)
for (i in 0:(m-1)) {
mat[col(mat) == row(mat) + i] <- delta^i / gamma(i + 1)
}
mat
}
Qfun <- function(delta) {
im <- (m - 1):0
x <- delta^im / gamma(im + 1)
mat <- outer(X = x, Y = x, FUN = "*")
im2 <- outer(im, im, FUN = "+")
sigma2eta * mat * delta / (im2 + 1)
}
Zfun <-  function(d) {
Z <- matrix(0.0, nrow = 1, ncol = m)
Z[1, d + 1] <- 1.0
Z
}
Hfun <- function(d) ifelse(d >= 0, Hstar[d + 1], 0.0)
Rfun <- function() diag(x = 1.0, nrow = m)

## define arrays by stacking the SSM matrices. We need one more
## 'delta' at the end of the series
n <- length(t)
delta <-  diff(t)
delta <- c(delta, mean(delta))

Ta <- Qa <- array(0.0, dim = c(m, m, n))
Za <- array(0.0, dim = c(1, m, n))
Ha <- array(0.0, dim = c(1, 1, n))
Ra <-  array(0.0, dim = c(m, m, n))

for (k in 1:n) {
Ta[ , , k] <- Tfun(delta[k])
Qa[ , , k] <- Qfun(delta[k])
Za[ , , k] <- Zfun(d[k])
Ha[ , , k] <- Hfun(d[k])
Ra[ , , k] <- Rfun()
}

require(KFAS)
## define the SSM and perform Kalman Filtering and smoothing
mod <- SSModel(y ~ SSMcustom(Z = Za, T = Ta, R = Ra, Q = Qa, n = n,
P1 = matrix(0, nrow = m, ncol = m),
P1inf = diag(1.0, nrow = m),
state_names = paste0("d", 0:(m-1))) - 1)
out <- KFS(mod, smoothing = "state")
list(t = t, filtered = out$$att, smoothed = out$$alphahat)

}

## An example function as in OP
f <- function(t, d = rep(0, length = length(t))) {
f <- rep(NA, length(t))
if (any(ind <- (d == 0))) f[ind] <- 2.0 + t[ind] - 0.5 * t[ind]^2
if (any(ind <- (d == 1))) f[ind] <- 1.0 - t[ind]
if (any(ind <- (d == 2))) f[ind] <- -1.0
f
}

set.seed(123)
n <-  100
t <- seq(from = 0, to = 10, length = n)
Hstar <- c(3, 0.4, 0.2)^2
sigma2eta <- 1.0

fTrue <- cbind(d0 = f(t), d1 = f(t, d = 1), d2 = f(t, d = 2))

## ============================================================================
## use a derivative index of -1 to indicate non-observed values, where
## 'y' will be NA
##
## [RUN #0]  no derivative  m = 2 (cubic spline)
## ============================================================================
d0 <- sample(c(-1, 0), size = n, replace = TRUE, prob = c(0.7, 0.3))
ft0 <-  f(t, d0)
## add noise picking the right sd
y0 <- ft0 + rnorm(n = n, sd = c(0.0, sqrt(Hstar))[d0 + 2])
res0 <- smoothWithDer(t, y0, d0, m = 2, Hstar = Hstar)

## ============================================================================
## [RUN #1] Only first order derivative: we can take m = 2 (cubic spline)
## ============================================================================
d1 <- sample(c(-1, 0:1), size = n, replace = TRUE, prob = c(0.7, 0.15, 0.15))
ft1 <-  f(t, d1)
y1 <- ft1 + rnorm(n = n, sd = c(0.0, sqrt(Hstar))[d1 + 2])
res1 <- smoothWithDer(t, y1, d1, m = 2, Hstar = Hstar)

## ============================================================================
## [RUN #2] First and second order derivative: we can take m = 3
## (quintic spline)
## ============================================================================
d2 <- sample(c(-1, 0:2), size = n, replace = TRUE, prob = c(0.7, 0.1, 0.1, 0.1))
ft2 <-  f(t, d2)
y2 <- ft2 + rnorm(n = n, sd = c(0.0, sqrt(Hstar))[d2 + 2])
res2 <- smoothWithDer(t, y2, d2, m = 3, Hstar = Hstar)

## plots : a ggplot with facets would be better here.
for (run in 0:2) {
resrun <- get(paste0("res", run))
drun <- get(paste0("d", run))
yrun <- get(paste0("y", run))
matplot(t, resrun\$smoothed, pch = 16, cex = 0.7, ylab = "", xlab = "")
matlines(t, fTrue, lwd = 2, lty = 1)
for (dv in 0:2) {
points(t[drun == dv], yrun[drun == dv], cex = 1.2, pch = 22, lwd = 2,
bg = "white", col = dv + 1)
}
title(main = sprintf("run %d. Dots = smooothed, lines = true, square = obs", run))
legend("bottomleft", col = 1:3, legend = c("d0", "d1", "d2"), lty = 1)
}