What is the problem with post-hoc testing?

My statistic professor says so, all the books that I look at state it: post-hoc-testing is unscientific. You must derive a hypothesis from theory first, and then collect data and analyse it.

But I really don’t understand what the problem is.

Suppose, I see sales figures for different car colors and form the hypothesis that from numbers of different-colored cars sold the largest group of cars on the street shoud be white. So I sit at some street one day and note all the colors of all the cars that pass me. Then I do some tests and find whatever.

Now, suppose I was bored and sat at some street one day and noted all the colors of all the cars that passed me. Since I love graphs, I plot a pretty histogram and find that white cars form the largest group. So I think that maybe most cars on the street are white and perform some tests.

How and why do the results or the interpretation of the results of the post-hoc test differ from those of the theory-driven* hypothesis test?

* What’s the name for the opposite of a post-hoc test, anyway?


I would like to add that most of our knowledge about the universe (the Earth moves around the Sun) is deduced post hoc from observation.

It seems to me that in physics it is perfectly okay to assume that it is not coincidence that the sun has been rising in the East for the last thousand years.

Answer

“You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!” Richard Feynman

I feel that I am not in position to explain the deep technical aspects of this problem. However I think many of them can be reduced to an intuition.

In the first set up you start with some hypothesis which you verify on new data (from the designed experiment). Studying the sales figures can lead you to a very crafted well-designed experiment, where you really can decide how strong your answer should be (statistical power, p-values, sample size, and other many stuff).

In the second set up first of all is that you decide nothing about the strength of the answer. This is one problem. The second problem is that extracting the hypothesis from the same sample used for tests, will increase in a very uncontrollable way the chances that random patterns are interpreted as valuable information. What you do is to notice something (that white cars are in great number) and ask yourself if this is significant. The point is that you selected only a notable fact visible on that sample, discarding other hypotheses. Doing that you created favourable conditions for some hypothesis, and you break the assumptions of most apriori statistical tests.

It is not scientific to behave like you did not know about this leak, and pretend that it is an experiment with all its assumptions, when it is not true. It is scientific in this case to use post hoc analysis to formulate a hypothesis and design a brand new experiment in order to test it.

Attribution
Source : Link , Question Author : Community , Answer Author : rapaio

Leave a Comment