- p-value is less than the level of significance if and only if the corresponding CI does not include the null value; and vica versa,
- the p-value is greater than the level of significance if and only if the corresponding CI does include the null value.
Is this idea always correct? Is it only correct for certain sampling distributions, like the normal, but not correct in general?
If the idea is correct in general, then why? The p-value is calculated using the distribution of the statistic conditional on H0, while the CI is calculated using the unconditional distribution of the statistic. These are two different distributions – how or why does this lead to the duality?
If the idea is not correct in general, could you provide counterexamples? Do people typically think that these counterexamples are a problem and somehow try to correct them, to make the idea still be true?
Basically the duality holds,
see also this question about the duality: Can we reject a null hypothesis with confidence intervals produced via sampling rather than the null hypothesis?
I can think of two reasons to say that it doesn’t hold (see below), but it is not because the duality is wrong and instead it is more about details and semantical (every kid has a parent, but that doesn’t mean that each kid and each parent are pairs).
No, not correct 1
There is no single the p-value and single the confidence interval. Instead, there are multiple ways to define p-values and multiple ways to define confidence intervals.
So a particular confidence interval and particular construction of a p-value do not need to correspond with each other.
Yes, correct 1
But, there is a correspondence such that every confidence interval can be used as a hypothesis test, and confidence distributions could be used to compute p-values for particular parameters/hypotheses.
The reason is that a confidence interval contains the parameter p% of the time no matter what the true parameter is.
So given that a hypothesis is true, the probability that it is outside a p% confidence interval is p%. The false rejection probability, if you use confidence intervals, is p%.
The only cases where this does not work are when the confidence intervals are not exact. E.g. sometimes confidence intervals are approximations or estimates. But then, you should allow the same freedom for p-values which can also be approximations or estimates.
No, not correct 2
The other way around is not necessarily true. With every p-value (or more generally the construction method for a p-value) you can not always construct a confidence interval. Instead, sometimes you end up with a confidence region (a set of disjoint intervals).