According to Popper, we cannot verify a hypothesis due to the problem of induction – we can only aim to falsify it. If we are repeatedly unable to falsify it, the hypothesis is said to be tentatively accepted. For Popper, all science shall come up with hypotheses and try to falsify them as hard as possible.
In some introductions to statistical hypothesis testing, I could read that scientists aim to falsify the null hypothesis and that this somehow is in accordance with Popper’s theory of falsification. Here is a posting, stating this view. (User @Stefan did comment this posting, making exactly my point.)
I have three questions:
- Doesn’t Popper say we should try to falsify the alternative hypothesis?
- Does falsifying the null hypothesis count as failed falsification of the alternative hypothesis?
- Might be some semantic sophistry: Shouldn’t scientists try to corroborate the null hypothesis instead of trying to falsify it?
(If this posting should be in the “philosophy”-board, please move it there…)
I was also going to point to Deborah Mayo’s work as linked in a comment. She is a Popper influenced philosopher who has written a lot about statistical testing.
I’ll try to address the questions.
(1a) Popper didn’t think of statistical testing as formalising his approach at all. Mayo states that this is because Popper was not expert enough in statistics, but also he probably wouldn’t have allowed for an error probability of 5% or 1% as “falsification” (Mayo may also have mentioned this somewhere, but I don’t remember).
(1b) There are different approaches for picking the null and alternative hypothesis. In some applications, the null hypothesis is a precise scientific theory of interest, and we check whether the data falsify it. This would be in line with Popper (at least if he allowed for nonzero error probabilities). In some other approaches (in many areas this is found much more often), the null hypothesis formalises the idea that “nothing meaningful is going on”, and the alternative is of actual scientific interest. This would not be in line with Popper. (Also, the alternative is not normally specified precisely enough to imply conditions for falsification, and be it statistical.)
(2) According to the standard logic of statistical tests, the null hypothesis can be statistically (i.e. with error probability) falsified, but not the alternative. There is a possibility to argue that an alternative is statistically falisfied, but this basically amounts to running tests the other way round. For example, if you have a H0: μ=0 and an alternative μ≠0, you cannot falsify the alternative (as it allows for μ arbitrarily close to 0, which cannot be distinguished by data from μ=0), but you could state that a meaningful deviation from μ=0 would actually be |μ|≥2, and in this case you may reject |μ|≥2 in case ˉx is very close to zero. This makes sense if the power of the original test for |μ|≥2 is large enough that in that case “ˉx close to zero” would be very unlikely. (This is related to Mayo’s concept of “severity”; in such a case we can say that |μ|<2 "with severity".) We could also then say that we have "statistically falsified" |μ|≥2.
(3) This is indeed a philosophical question, and I have seen arguments in either direction.