Predictions from BSTS model (in R) are failing completely

After reading this blog post about Bayesian structural time series models, I wanted to look at implementing this in the context of a problem I’d previously used ARIMA for.

I have some data with some known (but noisy) seasonal components – there are definitely an annual, monthly and weekly components to this, and also some effects due to special days (such as federal or religious holidays).

I have used the bsts package to implement this and as far as I can tell I haven’t done anything wrong, although the components and prediction simply don’t look as I’d expect. It isn’t clear to me if my implementation is wrong, incomplete or has some other problem.

The full time series looks like this:

Full data

I can train the model on some subset of the data, and the model generally looks good in terms of the fit (plot is below). The code I am using to do this is here:

library(bsts)

predict_length = 90
training_cut_date <- '2015-05-01'
test_cut_date <- as.Date(training_cut_date) + predict_length

df = read.csv('input.tsv', sep ='\t')

df$date <- as.Date(as.character(df$date),format="%Y-%m-%d")
df_train = df[df$date < training_cut_date,]

yts <- xts(log10(df_train$count), order.by=df_train$date)

ss <- AddLocalLinearTrend(list(), yts)
ss <- AddSeasonal(ss, yts, nseasons = 7)
ss <- AddSeasonal(ss, yts, nseasons = 12)
ss <- AddNamedHolidays(ss, named.holidays = NamedHolidays(), yts)

model <- bsts(yts, state.specification = ss, niter = 500, seed=2016)

The model looks reasonable:

Model Plot

But if I plot the prediction then firstly the trend is completely wrong, and secondly the uncertainty grows VERY quickly – to the point where I can’t show the uncertainty band on the same plot as the predictions without making the y axis on a log-scale. The code for this part is here:

burn <- SuggestBurn(0.1, model)
pred <- predict(model, horizon = predict_length, burn = burn, quantiles = c(.025, .975))

The pure prediction looks like this:

pure prediction

And then when scaled back to the initial distribution (with the dotted line showing the transition from training to prediction, the problems are obvious:

full distro

I have tried adding more seasonal trends, removing seasonal trends, adding an AR term, changing the AddLocalLinearModel to AddGeneralizedLocalLinearTrend and several other things concerning tweaking the model, but nothing has resolved the issues and made the predictions more meaningful. In some cases the direction changes, so rather than dropping to 0 the prediction just continues to increase as a function of time. I definitely don’t understand why the model is breaking down in this way. Any suggestions would be very welcome.

Answer

Steve Scott here. I wrote the bsts package. I have a few suggestions for you. First, your seasonal components aren’t doing what you think they are. I think you have daily data, because you’re trying to add a 7 season component, which should be working correctly. But you’ve told your annual seasonal component to repeat every 12 days. Getting a monthly seasonal component with daily data is kind of hard to do, but you can do a 52 week seasonal by
AddSeasonal(..., nseasons = 52, season.duration = 7).

The seasonal.duration argument tells the model how many time points each season should last for. The nseasons argument tells it how many seasons are in a cycle. The total number of time points in a cycle is season.duration * nseasons.

The second suggestion is that you might want to think about a different model for trend. The LocalLinearTrend model is very flexible, but this flexibility can show up as undesired variance in long term forecasts. There are some other trend models that contain a bit more structure. GeneralizedLocalLinearTrend (sorry about the nondescriptive name) assumes the “slope” component of trend is an AR1 process instead of a random walk. It is my default option if I want to forecast far into the future. Most of your time series variation seems to come from seasonality, so you might try AddLocalLevel or even AddAr instead of AddLocalLinearTrend.

Finally, in general if you’re getting strange forecasts, and you want to figure out which part of the model is to blame, try plot(model, "components") to see the model decomposed into the individual pieces you’ve requested.

Attribution
Source : Link , Question Author : anthr , Answer Author : Jan Kukacka

Leave a Comment