# On George Box, Galit Shmueli and the scientific method?

(This question might seem like it is better suited for the Philosophy SE. I am hoping that statisticians can clarify my misconceptions about Box’s and Shmueli’s statements, hence I am posting it here).

George Box (of ARIMA fame) said:

“All models are wrong, but some are useful.”

Galit Shmueli in her famous paper “To Explain or to Predict”, argues (and cites others who agree with her) that:

Explaining and predicting are not the same, and that some models do a good job of explaining, even though they do a poor job at predicting.

I feel that these two principles are somehow contradictory.

If a model doesn’t predict well, is it useful?

More importantly, if a model explains well (but doesn’t necessarily predict well), then it has to be true (i.e not wrong) in some way or another. So how does that mesh with Box’s “all models are wrong”?

Finally, if a model explains well, but doesn’t predict well, how is it even scientific? Most scientific demarcation criteria (verificationism, falsificstionism, etc…) imply that a scientific statement has to have predictive power, or colloquially: A theory or model is correct only if it can be empirically tested (or falsified), which means that it has to predict future outcomes.

My questions:

• Are Box’s statement and Shmueli’s ideas indeed contradictory, or am I missing something, e.g can a model not have predictive power yet still be useful?
• If the statements of Box and Shmueli are not contradictory, then what does it mean for a model to be wrong and not predict well, yet still have explanatory power? Put it differently: If one takes away both correctness and predictive ability, what is left of a model?

What empirical validations are possible when a model has explanatory power, but not predictive power? Shmueli mentions things like: use the AIC for explanation and the BIC for prediction, etc,…but I don’t see how that solves the problem. With predictive models, you can use the AIC, or the BIC, or the $$R^2R^2$$, or $$L1L1$$ regularization, etc…but ultimately out of sample testing and performance in production is what determines the quality of the model. But for models that explain well, I don’t see how any loss function can ever truly evaluate a model. In philosophy of science, there is the concept of underdetermination which seems pertinent here: For any given data set, one can always judiciously choose some distribution (or mixture of distributions) and loss function $$LL$$ in such a way that they fit the data (and therefore can be claimed to explain it). Moreover, the threshold that $$LL$$ should be under for someone to claim that the model adequately explains the data is arbitrary (kind of like p-values, why is it $$p < 0.05p < 0.05$$ and not $$p < 0.1p < 0.1$$ or $$p < 0.01p < 0.01$$?).

• Based on the above, how can one objectively validate a model that explains well, but doesn't predict well, since out of sample testing is not possible?