I have a dataset that contains, let’s say, some measurements for position, speed and acceleration. All come from the same “run”. I could construct a linear system and fit a polynomial to all of those measurements.
But can I do the same with splines? What is an ‘R’ way of doing this?
Here is some simulated data I would like to fit:
f <- function(x) 2+x-0.5*x^2+rnorm(length(x), mean=0, sd=0.1) df <- function(x) 1-x+rnorm(length(x), mean=0, sd=0.3) ddf <- function(x) -1+rnorm(length(x), mean=0, sd=0.6) x_f <- runif(5, 0, 5) x_df <- runif(8, 3, 8) x_ddf <- runif(10, 4, 9) data <- data.frame(type=rep('f'), x=x_f, y=f(x_f)) data <- rbind(data, data.frame(type=rep('df'), x=x_df, y=df(x_df))) data <- rbind(data, data.frame(type=rep('ddf'), x=x_ddf, y=ddf(x_ddf))) library(ggplot2) ggplot(data, aes(x, y, color=type)) + geom_point() library(splines) m <- lm(data$y ~ bs(data$x, degree=6)) # but I want to fit on f, df, ddf. possible?
Answer
We will describe how a spline can be used through Kalman Filtering
(KF) techniques in relation with a State-Space Model (SSM). The fact
that some spline models can be represented by SSM and computed with KF
was revealed by C.F. Ansley and R. Kohn in the years 1980-1990. The
estimated function and its derivatives are the expectations of the
state conditional on the observations. These estimates are computed by
using a fixed interval smoothing, a routine task when using a SSM.
For the sake of simplicity, assume that the observations are made at
times t1<t2<⋯<tn and that the observation number k at
tk involves only one derivative with order dk in
{0,1,2}. The observation part of the model writes as
y(tk)=f[dk](tk)+ε(tk)
where f(t) denotes the unobserved true function and ε(tk)
is a Gaussian error with variance H(tk) depending on the derivation
order dk. The (continuous time) transition equation takes the general form
\tag{T1}
\frac{\text{d}}{\text{d}t}\boldsymbol{\alpha}(t) =
\mathbf{A} \boldsymbol{\alpha}(t) + \boldsymbol{\eta}(t)
where \boldsymbol{\alpha}(t) is the unobserved state vector and
\boldsymbol{\eta}(t) is a Gaussian white noise with covariance \mathbf{Q},
assumed to be independent of the observation noise r.vs \varepsilon(t_k).
In order to describe a spline, we consider a state obtained by stacking the m
first derivatives, i.e. \boldsymbol{\alpha}(t) := [f(t),\, f^{[1]}(t), \,
\dots,\, f^{[m-1]}(t)]^\top. The transition is
\begin{bmatrix}
f^{[1]}(t) \\
f^{[2]}(t) \\
\vdots \\
f^{[m-1]}(t) \\
f^{[m]}(t)
\end{bmatrix}
=
\begin{bmatrix}
0 & 1 & 0 & &\\
0 & 0 & 1 & & \\
\vdots & & & \ddots &\\
& & & & 1\\
0 & \dots & & & 0
\end{bmatrix}
\begin{bmatrix}
f(t) \\
f^{[1]}(t) \\
\vdots \\
f^{[m-2]}(t)\\
f^{[m-1]}(t)
\end{bmatrix} +
\begin{bmatrix}
0 \\
0 \\
\vdots\\
0 \\
\eta(t)
\end{bmatrix}
and we then get a polynomial spline with order 2m (and degree
2m-1). While m=2 corresponds to the usual cubic spline, a higher
order will be required to use derivatives with order >1. In order to
stick to a classical SSM formalism we can rewrite (O1) as
\tag{O2}
y(t_k) = \mathbf{Z}(t_k) \boldsymbol{\alpha}(t_k) + \varepsilon(t_k),
where the observation matrix \mathbf{Z}(t_k) picks the suitable
derivative in \boldsymbol{\alpha}(t_k) and the variance H(t_k) of
\varepsilon(t_k) is chosen depending on d_k. So \mathbf{Z}(t_k) =
\mathbf{Z}^\star_{d_k + 1} where \mathbf{Z}^\star_1 := [1,\,0,\,\dots,\,0],
\mathbf{Z}^\star_2 := [0,\,1,\,\dots\,0] and \mathbf{Z}^\star_3 := [0,\,0,\,1, 0,\,\dots].
Similarly H(t_k) = H^\star_{d_k+1} for three variances H^\star_1,
H^\star_2, and H^\star_3.
Although the transition is in continuous time, the KF is actually a
standard discrete time one. Indeed, we will in practice focus on
times t where we have an observation, or where we want to estimate
the derivatives. We can take the set \{t_k\} to be the union of
these two sets of times and assume that the observation at t_k can
be missing: this allows to estimate the m derivatives at any time
t_k regardless of the existence of an observation. There remains to
derive the discrete SSM.
We will use indices for discrete times, writing \boldsymbol{\alpha}_k for
\boldsymbol{\alpha}(t_k) and so on. The discrete-time SSM takes the
form
\begin{align*}
\tag{DT}
\boldsymbol{\alpha}_{k+1} &= \mathbf{T}_k
\,\boldsymbol{\alpha}_{k} + \boldsymbol{\eta}^\star_{k}\\
y_k &= \mathbf{Z}_k\boldsymbol{\alpha}_k + \varepsilon_k
\end{align*}
where the matrices \mathbf{T}_k and \mathbf{Q}_k^\star :=
\text{Var}(\boldsymbol{\eta}_k^\star) are derived from (T1) and (O2)
while the variance of \varepsilon_k is given by
H_k=H^\star_{d_k+1} provided that y_k is not missing.
Using some algebra we can find the transition
matrix for the discrete-time SSM
\mathbf{T}_k = \exp\left\{ \delta_k \mathbf{A} \right\} =
\begin{bmatrix}
1 & \frac{\delta_k^1}{1!} & \frac{\delta_k^2}{2!} &
\dots & \frac{\delta_k^{m-1}}{(m-1)!}\\
0 & 1 & \frac{\delta_k^1}{1!} & & \\
\vdots & & & \ddots &\\
& & & & \frac{\delta_k^1}{1!}\\
0 & \dots & & & 1
\end{bmatrix}, \qquad
where \delta_k:= t_{k+1} - t_{k} for k<n. Similarly the
covariance matrix \mathbf{Q}^\star_k = \text{Var}
(\boldsymbol{\eta}_k^\star) for the discrete-time SSM can be
given as
\mathbf{Q}^\star_k=
\sigma_\eta^2 \, \left[\frac{\delta_k^{2m-i-j+1}}{(m-i)!(m-j)! (2m-i-j+1)}\right]_{i,j}
where the indices i and j are between 1 and m.
Now to carry over the computation in R we need a package devoted to KF
and accepting time-varying models; the CRAN package KFAS seems a
good option. We can write R functions to compute the matrices
\mathbf{T}_k and \mathbf{Q}^\star_k from the vector of times t_k
in order to encode the SSM (DT). In the notations used by the package,
a matrix \mathbf{R}_k comes to multiply the noise
\boldsymbol{\eta}^\star_k in the transition equation of (DT): we take
it here to be the identity \mathbf{I}_m. Also note that a diffuse
initial covariance must be used here.
EDIT The \mathbf{Q}^\star as initially written was wrong. Fixed (aslo in R code and image).
smoothWithDer <- function(t, y, d, m = 3,
Hstar = c(3, 0.2, 0.1)^2, sigma2eta = 1.0^2) {
## define the SSM matrices, depending on 'delta_k' or on 'd_k'
Tfun <- function(delta) {
mat <- matrix(0, nrow = m, ncol = m)
for (i in 0:(m-1)) {
mat[col(mat) == row(mat) + i] <- delta^i / gamma(i + 1)
}
mat
}
Qfun <- function(delta) {
im <- (m - 1):0
x <- delta^im / gamma(im + 1)
mat <- outer(X = x, Y = x, FUN = "*")
im2 <- outer(im, im, FUN = "+")
sigma2eta * mat * delta / (im2 + 1)
}
Zfun <- function(d) {
Z <- matrix(0.0, nrow = 1, ncol = m)
Z[1, d + 1] <- 1.0
Z
}
Hfun <- function(d) ifelse(d >= 0, Hstar[d + 1], 0.0)
Rfun <- function() diag(x = 1.0, nrow = m)
## define arrays by stacking the SSM matrices. We need one more
## 'delta' at the end of the series
n <- length(t)
delta <- diff(t)
delta <- c(delta, mean(delta))
Ta <- Qa <- array(0.0, dim = c(m, m, n))
Za <- array(0.0, dim = c(1, m, n))
Ha <- array(0.0, dim = c(1, 1, n))
Ra <- array(0.0, dim = c(m, m, n))
for (k in 1:n) {
Ta[ , , k] <- Tfun(delta[k])
Qa[ , , k] <- Qfun(delta[k])
Za[ , , k] <- Zfun(d[k])
Ha[ , , k] <- Hfun(d[k])
Ra[ , , k] <- Rfun()
}
require(KFAS)
## define the SSM and perform Kalman Filtering and smoothing
mod <- SSModel(y ~ SSMcustom(Z = Za, T = Ta, R = Ra, Q = Qa, n = n,
P1 = matrix(0, nrow = m, ncol = m),
P1inf = diag(1.0, nrow = m),
state_names = paste0("d", 0:(m-1))) - 1)
out <- KFS(mod, smoothing = "state")
list(t = t, filtered = out$att, smoothed = out$alphahat)
}
## An example function as in OP
f <- function(t, d = rep(0, length = length(t))) {
f <- rep(NA, length(t))
if (any(ind <- (d == 0))) f[ind] <- 2.0 + t[ind] - 0.5 * t[ind]^2
if (any(ind <- (d == 1))) f[ind] <- 1.0 - t[ind]
if (any(ind <- (d == 2))) f[ind] <- -1.0
f
}
set.seed(123)
n <- 100
t <- seq(from = 0, to = 10, length = n)
Hstar <- c(3, 0.4, 0.2)^2
sigma2eta <- 1.0
fTrue <- cbind(d0 = f(t), d1 = f(t, d = 1), d2 = f(t, d = 2))
## ============================================================================
## use a derivative index of -1 to indicate non-observed values, where
## 'y' will be NA
##
## [RUN #0] no derivative m = 2 (cubic spline)
## ============================================================================
d0 <- sample(c(-1, 0), size = n, replace = TRUE, prob = c(0.7, 0.3))
ft0 <- f(t, d0)
## add noise picking the right sd
y0 <- ft0 + rnorm(n = n, sd = c(0.0, sqrt(Hstar))[d0 + 2])
res0 <- smoothWithDer(t, y0, d0, m = 2, Hstar = Hstar)
## ============================================================================
## [RUN #1] Only first order derivative: we can take m = 2 (cubic spline)
## ============================================================================
d1 <- sample(c(-1, 0:1), size = n, replace = TRUE, prob = c(0.7, 0.15, 0.15))
ft1 <- f(t, d1)
y1 <- ft1 + rnorm(n = n, sd = c(0.0, sqrt(Hstar))[d1 + 2])
res1 <- smoothWithDer(t, y1, d1, m = 2, Hstar = Hstar)
## ============================================================================
## [RUN #2] First and second order derivative: we can take m = 3
## (quintic spline)
## ============================================================================
d2 <- sample(c(-1, 0:2), size = n, replace = TRUE, prob = c(0.7, 0.1, 0.1, 0.1))
ft2 <- f(t, d2)
y2 <- ft2 + rnorm(n = n, sd = c(0.0, sqrt(Hstar))[d2 + 2])
res2 <- smoothWithDer(t, y2, d2, m = 3, Hstar = Hstar)
## plots : a ggplot with facets would be better here.
for (run in 0:2) {
resrun <- get(paste0("res", run))
drun <- get(paste0("d", run))
yrun <- get(paste0("y", run))
matplot(t, resrun$smoothed, pch = 16, cex = 0.7, ylab = "", xlab = "")
matlines(t, fTrue, lwd = 2, lty = 1)
for (dv in 0:2) {
points(t[drun == dv], yrun[drun == dv], cex = 1.2, pch = 22, lwd = 2,
bg = "white", col = dv + 1)
}
title(main = sprintf("run %d. Dots = smooothed, lines = true, square = obs", run))
legend("bottomleft", col = 1:3, legend = c("d0", "d1", "d2"), lty = 1)
}
Attribution
Source : Link , Question Author : dani , Answer Author : Yves