I have results from the same test applied to two independent samples:
x <- c(17, 12, 13, 16, 9, 19, 21, 12, 18, 17) y <- c(10, 6, 15, 9, 8, 11, 8, 16, 13, 7, 5, 14)
And I want to compute a Wilcoxon rank sum test.
When I calculate the statistic TW by hand, I get:
When I let R perform a
wilcox.test(x, y, correct = F), I get:
W = 101.5
Why is that? Shouldn’t the statistic W+ only be returned when I perform a signed rank test with
paired = T? Or do I misunderstand the rank sum test?
How can I tell R to output TW
As part of the test results, not through something like:
dat <- data.frame(v = c(x, y), s = factor(rep(c("x", "y"), c(10, 12)))) dat$r <- rank(dat$v) T.W <- sum(dat$r[dat$s == "x"])
I asked a follow up question about the meaning of the Different ways to calculate the test statistic for the Wilcoxon rank sum test
Note in the help on the
wilcox.test function clearly explains why R’s value is smaller than yours:
The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not: R subtracts and S-PLUS does not, giving a value which is larger by m(m+1)/2 for a first sample of size m. (It seems Wilcoxon’s original paper used the unadjusted sum of the ranks but subsequent tables subtracted the minimum.)
That is, the definition R uses is n1(n1+1)/2 smaller than the version you use, where n1 is the number of observations in the first sample.
As for modifying the result, you could assign the output from
wilcox.test into a variable, say
a, and then manipulate
a$statistic – adding the minimum to its value and changing its name. Then when you print
a (e.g. by typing
a), it will look the way you want.
To see what I am getting at, try this:
a <- wilcox.test(x,y,correct=FALSE) str(a)
So for example if you do this:
n1 <- length(x) a$statistic <- a$statistic + n1*(n1+1)/2 names(a$statistic) <- "T.W" a
then you get:
Wilcoxon rank sum test with continuity correction data: x and y T.W = 156.5, p-value = 0.006768 alternative hypothesis: true location shift is not equal to 0
It’s quite common to refer to the rank sum test (whether shifted by n1(n1+1)/2 or not) as either W or w or some close variant (e.g. here or here). It also often gets called ‘U‘ because of Mann & Whitney. There’s plenty of precedent for using W, so for myself I wouldn’t bother with the line that changes the name of the statistic, but if it suits you to do so there’s no reason why you shouldn’t, either.