# Strange result of post-hoc test

I have data for a test on three groups. The measured variable is ratio scaled. The R code is

``````g1a<-c(7, 3, 40)
g2a<-c(1,1,2)
g3a<-c(0,0,0)
``````

Since the sample is small and normality cannot be guaranteed, I run a Kruskal Wallis test to check for significance:

``````l<-list(g1a,g2a,g3a)
kruskal.test(l)
``````

The p-value is 0.02336, which is nice.

Now I run a post-hoc test, using the Mann-Whitney U:

``````wilcox.test(g1a,g2a,paired=FALSE,exact=TRUE)
wilcox.test(g2a,g3a,paired=FALSE,exact=TRUE)
wilcox.test(g1a,g3a,paired=FALSE,exact=TRUE)
``````

All the resulting p-values are above 0.05 (0.07652, 0.0636, 0.05935). This is very strange. Shouldn’t one of these tests give a much lower p-value? Especially since I’d have to use some sort of correction to account for the multiple comparisons in the post-hoc test. In other words: how can I interpret this result?

Think of it this way – overall, there’s a significant difference, but it’s a little hard to say exactly which two are significantly different. Alternatively, consider the chances of having three p-values less than 0.1 (even though they aren’t independent of each other) – pretty small, right? So, again overall, we might suspect something significant is in the data, without being able to tell exactly where.

Your small sample sizes don’t help; they mean the powers of your tests are very low, and also severely constrain what sort of p-values you can get, as the following example shows:

``````> g1a <- rnorm(3,0,1)
> g2a <- rnorm(3,2.5,1)
> g3a <- rnorm(3,5,1)
>
> y <- list(g1a,g2a,g3a)
> y
[]
 -2.31356435 -0.09903136 -0.42037052

[]
 2.806082 2.799857 3.383844

[]
 6.543636 6.845559 4.838341

> kruskal.test(y)

Kruskal-Wallis rank sum test

data:  y
Kruskal-Wallis chi-squared = 7.2, df = 2, p-value = 0.02732
``````

So far, so good. On to the three Wilcoxon tests:

``````> wilcox.test(g1a,g2a,paired=FALSE,exact=TRUE)

Wilcoxon rank sum test

data:  g1a and g2a
W = 0, p-value = 0.1
alternative hypothesis: true location shift is not equal to 0

> wilcox.test(g2a,g3a,paired=FALSE,exact=TRUE)

Wilcoxon rank sum test

data:  g2a and g3a
W = 0, p-value = 0.1
alternative hypothesis: true location shift is not equal to 0

> wilcox.test(g1a,g3a,paired=FALSE,exact=TRUE)

Wilcoxon rank sum test

data:  g1a and g3a
W = 0, p-value = 0.1
alternative hypothesis: true location shift is not equal to 0
``````

All three p-values at 0.1, but we can’t get more extreme – W = 0 – so evidently we’ve hit a sample size imposed limit on p-values.