I’m using a daily time series of sales data that contains about 2 years of daily data points. Based on some of the online-tutorials / examples I tried to identify the seasonality in the data. It seems that there is a weekly, monthly and probably a yearly periodicity / seasonality.
For example, there are paydays, particularly on 1st payday of the month effect that lasts for few days during the week. There are also some specific Holiday effects, clearly identifiable by noting the observations.
Equipped with some of these observations, I tried the following:
auto.arimafrom R-forecast package), using regressor (and other default values needed in the function). The regressor I created is basically a matrix of 0/1 values:
- 11 month (n-1) variables
- 12 holiday variables
- Could not figure out the payday part…since it’s little more complicated effect than I thought. The payday effect works differently, depending on the weekday of the 1st of month.
I used 7 (i.e., weekly frequency) to model the time series. I tried the test – forecasting 7 days at a time. The results are reasonable: average accuracy for a forecast of 11 weeks comes to weekly avg RMSE to 5%.
TBATS model (from R-forecast package) – using multiple seasonality (7, 30.4375, 365.25) and obviously no regressor. The accuracy is surprisingly better than the ARIMA model at weekly avg RMSE 3.5% .
In this case, the model without ARMA errors perform slightly better. Now If I apply the coefficients for just the Holiday Effects from the ARIMA model described in #1, to the results of the TBATS model the weekly avg RMSE improves to 2.95%
Now without having much background or knowledge on the underlying theories of these models, I’m in a dilemma whether this TBATS approach is even a valid one. Even though it’s improving the RMSE significantly in the 11 weeks test, I’m wondering whether it can sustain this accuracy in the future. Or even if applying Holiday effects from ARIMA to the TBATS result is justifiable. Any thoughts from any / all the contributors will be highly appreciated.
Note: Do “Save Link As”, to download the file.
You should be evaluating models and forecasts from different origins across different horizons and not one one number in order to gauge an approach.
I assume that your data is from the US. I prefer 3+ years of daily data as you can have two holidays landing on a weekend and get no weekday read. It looks like your Thanksgiving impact is a day off in the 2012 or there was a recording error of some kind and caused the model to miss the Thanksgiving day effect.
Januarys are typically low in the dataset if you look as a % of the year. Weekends are high. The dummies reflect this behavior….MONTH_EFF01, FIXED_EFF_N10507,FIXED_EFF_N10607
I have found that using an AR component with daily data assumes that the last two weeks day of the week pattern is how the pattern is in general which is a big assumption. We started with 11 monthly dummies and 6 daily dummies. Some dropped out of the model. B**1 means that there is a lag impact the day after a holiday. There were 6 special days of the month (days 2,3,5,21,29,30—-21 might be spurious?) and 3 time trends, 2 seasonal pulses (where a day of the week started deviating from the typical, a 0 before this data and a 1 every 7th day after) and 2 outliers (note the thanksgiving!) This took just under 7 minutes to run. Download all results here www.autobox.com/se/dd/daily.zip
It includes a quick and dirty XLS sheet to check to see if the model makes sense. Of course, the XLS % are in fact bad as they are crude benchmarks.
Try estimating this model:
Y(T) = .53169E+06 +[X1(T)][(+ .13482E+06B** 1)] M_HALLOWEEN +[X2(T)][(+ .17378E+06B**-3)] M_JULY4TH +[X3(T)][(- .11556E+06)] M_MEMORIALDAY +[X4(T)][(- .16706E+06B**-4+ .13960E+06B**-3- .15636E+06B**-2 - .19886E+06B**-1)] M_NEWYEARS +[X5(T)][(+ .17023E+06B**-2- .26854E+06B**-1- .14257E+06B** 1)] M_THANKSGIVI +[X6(T)][(- 71726. )] MONTH_EFF01 +[X7(T)][(+ 55617. )] MONTH_EFF02 +[X8(T)][(+ 27827. )] MONTH_EFF03 +[X9(T)][(- 37945. )] MONTH_EFF09 +[X10(T)[(- 23652. )] MONTH_EFF10 +[X11(T)[(- 33488. )] MONTH_EFF11 +[X12(T)[(+ 39389. )] FIXED_EFF_N10107 +[X13(T)[(+ 63399. )] FIXED_EFF_N10207 +[X14(T)[(+ .13727E+06)] FIXED_EFF_N10307 +[X15(T)[(+ .25144E+06)] FIXED_EFF_N10407 +[X16(T)[(+ .32004E+06)] FIXED_EFF_N10507 +[X17(T)[(+ .29156E+06)] FIXED_EFF_N10607 +[X18(T)[(+ 74960. )] FIXED_DAY02 +[X19(T)[(+ 39299. )] FIXED_DAY03 +[X20(T)[(+ 27660. )] FIXED_DAY05 +[X21(T)[(- 33451. )] FIXED_DAY21 +[X22(T)[(+ 43602. )] FIXED_DAY29 +[X23(T)[(+ 68016. )] FIXED_DAY30 +[X24(T)[(+ 226.98 )] :TIME TREND 1 1/ 1 1/ 3/2011 I~T00001__010311stack +[X25(T)[(- 133.25 )] :TIME TREND 423 61/ 3 2/29/2012 I~T00423__010311stack +[X26(T)[(+ 164.56 )] :TIME TREND 631 91/ 1 9/24/2012 I~T00631__010311stack +[X27(T)[(- .42528E+06)] :SEASONAL PULSE 733 105/ 5 1/ 4/2013 I~S00733__010311stack +[X28(T)[(- .33108E+06)] :SEASONAL PULSE 370 53/ 6 1/ 7/2012 I~S00370__010311stack +[X29(T)[(- .82083E+06)] :PULSE 326 47/ 4 11/24/2011 I~P00326__010311stack +[X30(T)[(+ .17502E+06)] :PULSE 394 57/ 2 1/31/2012 I~P00394__010311stack + + [A(T)]