How to use auto.arima to impute missing values

I have a zoo series with many missing values. I read that auto.arima can impute these missing values? Can anyone can teach me how to do it? thanks a lot!

This is what I have tried, but without success:

fit <- auto.arima(tsx)
plot(forecast(fit))

Answer

First, be aware that forecast computes out-of-sample predictions but you are interested in in-sample observations.

The Kalman filter handles missing values. Thus you can take the state space form of the ARIMA model from the output returned by forecast::auto.arima or stats::arima and pass it to KalmanRun.

Edit (fix in the code based on answer by stats0007)

In a previous version I took the column of the filtered states related to the
observed series, however I should use the entire matrix and do the corresponding matrix operation of the observation equation, yt=Zαt. (Thanks to @stats0007 for the comments.) Below I update the code and plot accordingly.

I use a ts object as a sample series instead of zoo, but it should be the same:

require(forecast)
# sample series
x0 <- x <- log(AirPassengers)
y <- x
# set some missing values
x[c(10,60:71,100,130)] <- NA
# fit model
fit <- auto.arima(x)
# Kalman filter
kr <- KalmanRun(x, fit$model)
# impute missing values Z %*% alpha at each missing observation
id.na <- which(is.na(x))
for (i in id.na)
  y[i] <- fit$model$Z %*% kr$states[i,]
# alternative to the explicit loop above
sapply(id.na, FUN = function(x, Z, alpha) Z %*% alpha[x,], 
  Z = fit$model$Z, alpha = kr$states)
y[id.na]
# [1] 4.767653 5.348100 5.364654 5.397167 5.523751 5.478211 5.482107 5.593442
# [9] 5.666549 5.701984 5.569021 5.463723 5.339286 5.855145 6.005067

You can plot the result (for the whole series and for the entire year with missing observations in the middle of the sample):

par(mfrow = c(2, 1), mar = c(2.2,2.2,2,2))
plot(x0, col = "gray")
lines(x)
points(time(x0)[id.na], x0[id.na], col = "blue", pch = 19)
points(time(y)[id.na], y[id.na], col = "red", pch = 17)
legend("topleft", legend = c("true values", "imputed values"), 
  col = c("blue", "red"), pch = c(19, 17))
plot(time(x0)[60:71], x0[60:71], type = "b", col = "blue", 
  pch = 19, ylim = range(x0[60:71]))
points(time(y)[60:71], y[60:71], col = "red", pch = 17)
lines(time(y)[60:71], y[60:71], col = "red")
legend("topleft", legend = c("true values", "imputed values"), 
  col = c("blue", "red"), pch = c(19, 17), lty = c(1, 1))

plot of the original series and the values imputed to missing observations

You can repeat the same example using the Kalman smoother instead of the Kalman filter. All you need to change are these lines:

kr <- KalmanSmooth(x, fit$model)
y[i] <- kr$smooth[i,]

Dealing with missing observations by means of the Kalman filter is sometimes interpreted as extrapolation of the series; when the Kalman smoother is used, missing observations are said to be filled in by interpolation in the observed series.

Attribution
Source : Link , Question Author : user3730957 , Answer Author : javlacalle

Leave a Comment