So what I’ve read about Facebook’s prophet is that it basically breaks down the time series into trend and seasonality. For example, an additive model would be written as:
y(t) = g(t) + s(t) + h(t) + e_t
- t the time
- g(t) the trend (may it be linear or logistic)
- s(t) the seasonality (daily,weekly,yearly…)
- h(t) the holidays
- e_t the error
My questions are: Couldn’t it be done with a simple linear regression?
What would be the differences in term of results if we compared them, and why?
The issue here is to get to an equation that parses the observed data to signal and noise. If your data is simple then your regression approach might work. Care should be taken to understand some of the assumptions that they are making with Prophet. You should better understand what Prophet does do, as it doesn’t just fit a simple model but attempts to add some structure.
For example, some reflections that I made after reading their well-written introduction might help you in your evaluation. I apologize in advance if I have misunderstood their approach, and would like to be corrected if so.
1) Their lead example has two break-points in trend but they only captured the most obvious one.
2) They ignore any and all ARIMA structure reflecting omitted stochastic series or the value of using historical values of Y to guide the forecast.
3) They ignore any possible dynamics ( lead and lag effects ) of user-suggested stochastic and deterministic series. Prophet’s causal regression effects are simply just contemporaneous.
4) No attempt is made to identify step/level shifts in the series or seasonal pulses e.g. a change in the MONDAY EFFECT halfway through time due to some unknown external event. Prophet assumes “simple linear growth’ rather than validating it by examining alternative possibilities. For a possible example of this see Forecasting recurring orders for an online subscription business using Facebook Prophet and R
5) Sines and Cosines are an opaque way of dealing with seasonality, while seasonal effects such as day-of-the-week, day-of-the-month, week-of-the-month, month of-the-year are much more effective/informative when dealing with anthropogenic ( dealing with humans ! ) effects.
Suggesting frequencies of 365.25 for yearly patterns makes little sense because we don’t perform the same action on the exact same day as we did last year, while monthly activity is much more persistent, but Prophet doesn’t appear to offer the 11 monthly indicators option. Weekly frequencies of 52 make little sense because we don’t have 52 weeks in each and every year.
6) No attempt is made to validate error processes being Gaussian so meaningful tests of significance can be made.
7) No concern for model error variance to be homogeneous, i.e., not changing deterministically at particular points in time suggesting Weighted Least Squares. No concern for finding an optimal power transform to deal the error variance being proportional to the Expected Value When (and why) should you take the log of a distribution (of numbers)? .
8) User has to pre-specify all possible lead and lag effects around events/holidays. For example, daily sales often start to increase in late November, reflecting a long-term effect of Christmas.
9) No concern that the resulting errors are free of structure suggesting ways to improve the model via diagnostic checking for sufficiency.
10) Apparently no concern with improving the model by deleting non-significant structure.
11) There is no facility to obtain a family of simulated forecasts where confidence limits may not necessarily be symmetrical via bootstrapping the model’s errors with the allowance of possible anomalies.
12) Letting the user make assumptions about trends ( # of trend breakpoints and the actual breakpoints ) allows unwanted/unusable flexibility in the face of large-scale analysis which by it’s name is designed for hands-free large-scale applications.