I’m reviewing a paper that has performed >15 separate 2×2 Chi Square tests. I’ve suggested that they need to correct for multiple comparisons, but they have replied saying that all the comparisons were planned, and therefore this is not necessary.
I feel like this must not be correct but can’t find any resources that explicitly state whether this is the case.
Is anyone able to help with this?
Thanks for all of your very helpful responses. In response to @gung’s request for some more information on the study and the analyses, they are comparing count data for two types of participants (students, non-students) in two conditions, across three time periods. The multiple 2×2 Chi Square tests are comparing each time period, in each condition, for each type of participant (if that makes sense; e.g. students, condition 1, time period 1 vs time period 2), so all analyses are testing the same hypothesis.
This is IMHO a complex issue and I would like to make three comments about this situation.
First and generally, I would more focus on whether you face a confirmatory study with a set of well-shaped hypotheses defined in a argumentative context or an explanatory study in which many likely indicators are observed than whether they are planned or not (because you can simply plan to make all possible comparisons).
Second, I would also focus on how the resulting p-values are then discussed. Are they individually used to serve a set of definitive conclusions, or are they jointly discussed as evidence and lack of evidence?
Finally, I would discuss the possibility that the >15 hypothesis resulting from the >15 separate chi-squared tests are in fact the expression of a single few hypotheses (maybe a single one) that may be summarized.
More generally, regardless of whether hypothesis are prespecified or not, correcting for multiple comparisons or not is a matter of what you include in the type I error. By not correcting for MC, you only keep a per comparison type I error rate control. So in case of numerous comparisons, you have a high family-wise type I error rate and thus are more false discovery prone.