MAD = Mean Absolute Deviation
MSE = Mean Squared Error
I’ve seen suggestions from various places that MSE is used despite some undesirable qualities (e.g. http://www.stat.nus.edu.sg/~staxyc/T12.pdf, which states on p8 “It is commonly believed that MAD is a better criterion than MSE. However, mathematically MSE is more convenient than MAD.”)
Is there more to it than that? Is there a paper that thoroughly analyzes the situations in which various methods of measuring forecast error are more/less appropriate? My google searches haven’t revealed anything.
A similar question to this was asked at https://stackoverflow.com/questions/13391376/how-to-decide-the-forecasting-method-from-the-me-mad-mse-sde, and the user was asked to post on stats.stackexchange.com, but I don’t think they ever did.
To decide which point forecast error measure to use, we need to take a step back. Note that we don’t know the future outcome perfectly, nor will we ever. So the future outcome follows a probability distribution. Some forecasting methods explicitly output such a full distribution, and some don’t – but it is always there, if only implicitly.
Now, we want to have a good error measure for a point forecast. Such a point forecast Ft is our attempt to summarize what we know about the future distribution (i.e., the predictive distribution) at time t using a single number, a so-called functional of the future density. The error measure then is a way to assess the quality of this single number summary.
So you should choose an error measure that rewards “good” one number summaries of (unknown, possibly forecasted, but possibly only implicit) future densities.
The challenge is that different error measures are minimized by different functionals. The expected MSE is minimized by the expected value of the future distribution. The expected MAD is minimized by the median of the future distribution. Thus, if you calibrate your forecasts to minimize the MAE, your point forecast will be the future median, not the future expected value, and your forecasts will be biased if your future distribution is not symmetric.
This is most relevant for count data, which are typically skewed. In extreme cases (say, Poisson distributed sales with a mean below log2≈0.69), your MAE will be lowest for a flat zero forecast. See here or here or here for details.
I give some more information and an illustration in What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? That thread considers the mape, but also other error measures, and it contains links to other related threads.
In the end, which error measure to use really depends on your Cost of Forecast Error, i.e., which kind of error is most painful. Without looking at the actual implications of forecast errors, any discussion about “better criteria” is basically meaningless.
Measures of forecast accuracy were a big topic in the forecasting community some years back, and they still pop up now and then. One very good article to look at is Hyndman & Koehler “Another look at measures of forecast accuracy” (2006).
Finally, one alternative is to calculate full predictive densities and assess these using proper scoring-rules.