I have a fairly predictable daily time series with weekly seasonality. I am able to come up with predictions that appear to be pretty accurate (confirmed by cross-validation) when there are no holidays. However, when there are holidays, I have the following issues:
- I get non-zero numbers for the holidays in my forecast, even though all historical holidays are 0. This really isn’t the main issue though. The issue is…
- Since processing that doesn’t occur on holidays “spills over” to the days following the holidays, a simple dummy variable does not cut it, as these outliers appear to be short-term innovational. If there were no weekly seasonality, I could perhaps come up with an estimate for distributing the unprocessed data on the holiday over the five or so days following the holiday (as suggested in How do you create variables reflecting the lead and lag impact of holidays / calendar effects in a time-series analysis?) . However, the distribution of the “spill over” depends on the day of the week the holiday occurs, and whether or not the holiday is Christmas or Thanksgiving, where orders are placed at a lower rate than the rest of the year.
Here are a few snapshots from my cross-validation that show the predicted (blue) vs the actual (red) outcome for holidays that appear on different days of the week:
I also worry that the impact of Christmas depends on the day of the week it falls on, and I only have six or so years of historical data.
Does anyone have any suggestions for how to deal with these types of innovational outliers in the context of forecasting? (Unfortunately I can’t share any data)
Couldn’t you create a dummy variable for holiday, one for holiday+1 and one for holiday+2 and only set them to 1 as long as they fall on a weekday?
As for Thanksgiving and Christmas, introducing separate dummy variables for these holidays seems to be your worst case option (since you only have six years of data). To a certain extent, that might be your only option though – people simply behave differently on those holidays than they do on, say, Fourth of July (and if you are studying e.g. retail sales patterns, then you definitely simply have to live with those being “special” holidays and would definitely want to analyze them separately). However, maybe the below ideas are helpful to you:
- Thanksgiving. Shouldn’t the fact that it always falls on the same day of the week (Thursday) make it easier? I.e. a Thanksgiving dummy might just be workable even in a six-year data set because the weekday pattern will always be the same.
- Christmas. It appears to me from looking at your graph that the main issue is that the effect lasts longer than after other holidays – if you define “Christmas” as Christmas Eve (Dec 24th), then that will be because many people will also stay home on Christmas Day (Dec 25th) (and even Boxing Day (Dec 26th) in some places). I’ll think some more about this.
I hope this helps.