# Unequal sample sizes: When to call it quits

I’m peer reviewing an academic journal article and the authors wrote the following as justification for not reporting any inferential statistics (I deidentified the nature of the two groups):

In total, 25 of the 2,349 (1.1%) respondents reported X. We appropriately refrain from presenting analyses that statistically compare group X to group Y (the other 2,324 participants) since those results could be heavily driven by chance with an outcome this rare.

My question is are the authors of this study justified in throwing in the towel with respect to comparing groups? If not, what might I recommend to them?

On the other hand, type II error rates very much will be affected by highly unequal $n$s. This will be true no matter what the test (e.g., the $t$-test, Mann-Whitney $U$-test, or $z$-test for equality of proportions will all be affected in this way). For an example of this, see my answer here: How should one interpret the comparison of means from different sample sizes? Thus, they may well be “justified in throwing in the towel” with respect to this issue. (Specifically, if you expect to get a non-significant result whether the effect is real or not, what is the point of the test?)
As the sample sizes diverge, statistical power will converge to $\alpha$. This fact actually leads to a different suggestion, which I suspect few people have ever heard of and would probably have trouble getting past reviewers (no offense intended): a compromise power analysis. The idea is relatively straightforward: In any power analysis, $\alpha$, $\beta$, $n_1$, $n_2$, and the effect size $d$, exist in relationship to each other. Having specified all but one, you can solve for the last. Typically, people do what is called an a-priori power analysis, in which you solve for $N$ (generally you are assuming $n_1=n_2$). On the other hand, you can fix $n_1$, $n_2$, and $d$, and solve for $\alpha$ (or equivalently $\beta$), if you specify the ratio of type I to type II error rates that you are willing to live with. Conventionally, $\alpha=.05$ and $\beta=.20$, so you are saying that type I errors are four times worse than type I errors. Of course, a given researcher might disagree with that, but having specified a given ratio, you can solve for what $\alpha$ you should be using in order to possibly maintain some adequate power. This approach is a logically valid option for the researchers in this situation, although I acknowledge the exoticness of this approach may make it a tough sell in the larger research community that probably has never heard of such a thing.