# How to use auto.arima to impute missing values

I have a zoo series with many missing values. I read that auto.arima can impute these missing values? Can anyone can teach me how to do it? thanks a lot!

This is what I have tried, but without success:

fit <- auto.arima(tsx)
plot(forecast(fit))


First, be aware that forecast computes out-of-sample predictions but you are interested in in-sample observations.

The Kalman filter handles missing values. Thus you can take the state space form of the ARIMA model from the output returned by forecast::auto.arima or stats::arima and pass it to KalmanRun.

Edit (fix in the code based on answer by stats0007)

In a previous version I took the column of the filtered states related to the
observed series, however I should use the entire matrix and do the corresponding matrix operation of the observation equation, $y_t = Z \alpha_t$. (Thanks to @stats0007 for the comments.) Below I update the code and plot accordingly.

I use a ts object as a sample series instead of zoo, but it should be the same:

require(forecast)
# sample series
x0 <- x <- log(AirPassengers)
y <- x
# set some missing values
x[c(10,60:71,100,130)] <- NA
# fit model
fit <- auto.arima(x)
# Kalman filter
kr <- KalmanRun(x, fit$model) # impute missing values Z %*% alpha at each missing observation id.na <- which(is.na(x)) for (i in id.na) y[i] <- fit$model$Z %*% kr$states[i,]
# alternative to the explicit loop above
sapply(id.na, FUN = function(x, Z, alpha) Z %*% alpha[x,],
Z = fit$model$Z, alpha = kr$states) y[id.na] # [1] 4.767653 5.348100 5.364654 5.397167 5.523751 5.478211 5.482107 5.593442 # [9] 5.666549 5.701984 5.569021 5.463723 5.339286 5.855145 6.005067  You can plot the result (for the whole series and for the entire year with missing observations in the middle of the sample): par(mfrow = c(2, 1), mar = c(2.2,2.2,2,2)) plot(x0, col = "gray") lines(x) points(time(x0)[id.na], x0[id.na], col = "blue", pch = 19) points(time(y)[id.na], y[id.na], col = "red", pch = 17) legend("topleft", legend = c("true values", "imputed values"), col = c("blue", "red"), pch = c(19, 17)) plot(time(x0)[60:71], x0[60:71], type = "b", col = "blue", pch = 19, ylim = range(x0[60:71])) points(time(y)[60:71], y[60:71], col = "red", pch = 17) lines(time(y)[60:71], y[60:71], col = "red") legend("topleft", legend = c("true values", "imputed values"), col = c("blue", "red"), pch = c(19, 17), lty = c(1, 1))  You can repeat the same example using the Kalman smoother instead of the Kalman filter. All you need to change are these lines: kr <- KalmanSmooth(x, fit$model)
y[i] <- kr\$smooth[i,]


Dealing with missing observations by means of the Kalman filter is sometimes interpreted as extrapolation of the series; when the Kalman smoother is used, missing observations are said to be filled in by interpolation in the observed series.