# Power analysis for Kruskal-Wallis or Mann-Whitney U test using R?

Is it possible to perform a power analysis for the Kruskal-Wallis and Mann-Whitney U test? If yes, are there any R packages/functions that perform it?

It’s certainly possible to calculate power.

To be more specific – if you make sufficient assumptions to obtain a situation in which you can calculate (in some fashion) the probability of rejection, you can compute power.

In the Wilcoxon-Mann-Whitney, if (for example) you assume the distribution shapes (make an assumption about the distributional form(s)) and make some assumption about the scales (spreads) and specific values of the locations or difference in locations, you might be able to compute the power either algebraically or via numerical integration; failing that you can simulate the rejection rate.

So for example if we assume sampling from $t_5$ distributions with specified location-difference (standardized for a common scale), then given the sample sizes we could simulate many data sets satisfying all those conditions and so obtain an estimate of the rejection rate. So let’s assume we have two samples of $t_5$ distributions (location-scale family) with unit scale ($\sigma=1$) – without loss of generality – and with location difference $\delta=\mu_2-\mu_1=1$. Again, without loss of generality we could take $\mu_1=0$. Then for some specified sample size — $n_1=6,n_2=9$ (say) — we can simulate the observations and hence the power for that particular value of $\delta/\sigma$ (i.e. $1$). Here’s a quick example in R:

n1=6;n2=9;tdf=5;delta=1;al=0.05;nsim=10000
res = replicate(nsim,{y1=rt(n1,tdf);y2=rt(n2,tdf)+delta;wilcox.test(y1,y2)\$p.value<=al})
mean(res)  # res will be logical ("TRUE" = reject); mean is rej rate


Three simulations like that produced rejection rates of 0.321, 0.321 and 0.316; the power is apparently in the vicinity of 0.32 (you can compute a confidence interval from just a single one of those simulations, since the rejection count is binomial). In practice I tend to use larger simulations, but if you’re simulating a lot of different $n$‘s or $\delta$‘s you may not want to go much higher than 10000 simulations for each one.

By doing it for many values of the location shift you can even get a power curve for that set of circumstances as the location shift changes if you wish.

In large samples doubling $n_1$ and $n_2$ will be like halving $\sigma^2$ (and so increasing $\delta/\sigma$ at a given $\delta$) so you can often get good approximations at various $n$ from simulations at only a few $n$ values. Similarly, for one tailed tests, if $1-b_i$ is the rejection rate at $\delta=\delta_i$ then $\Phi^{-1}(1-b)$ tends to be close to linear in $\delta$ (again, allowing a good approximation at various $\delta$ values from simulations at only a few values of $\delta$ (a dozen well chosen values is often plenty). Sensible choices of smoothing will often produce remarkably good approximation of the power at other values of $n$ or $\delta$.

You needn’t limit yourself to location shifts of course. Any change in parameters that would tend to lead to a change in $P(Y_2>Y_1)$ will be something you can investigate.

Note that while these tests are distribution-free (for continuous distributions) under the null, the behavior is different under different distributional assumptions for the alternatives.

The situation for the Kruskal-Wallis is similar, but you have more location shifts (or whatever other situation you’re looking at) to specify.

The plot in this answer shows a comparison of a power curve for a paired t test against simulated power for a signed rank test at a particular sample size, across a variety of standardized location shifts for sampling from normal distributions with a specified correlation between pairs. Similar calculations can be done for the Mann-Whitney and the Kruskal-Wallis.