Effect size to Wilcoxon signed rank test?

Some authors (e.g. Pallant, 2007, p. 225; see image below) suggest to calculate the effect size for a Wilcoxon signed rank test by dividing the test statistic by the square root of the number of observations:


Z is the test statistic output by SPSS (see image below) as well as by wilcoxsign_test in R. (See also my related question: teststatistic vs linearstatistic in wilcoxsign_test)

Others suggest the Bravais-Pearson (r=cov(XY)sd(X)×sd(Y)) or Spearman (rS) correlation coefficients (depending on data type).

When you calculate them, the two rs are not even remotely the same. E.g., for my current data:

r = 0.23   ( for r=Znx+ny )

r = 0.43   ( Pearson )

These would imply quite different effect sizes.

So which is the correct effect size to use, and how do the two rs relate to each other?

Pages 224 (bottom part) and 225 from Pallant, J. (2007). SPSS Survival Manual:

enter image description here

enter image description here


  • If you don’t have ties, I would report the proportion of after values that are less than the corresponding before values.
  • If you do have ties, you could report the proportion of after values that are less than before out of the total number of non-tied pairs, or report all three proportions (<, =, >) and perhaps the sum of whichever two were more meaningful. For example, you could say ‘33% had less fear of statistics, 57% were unchanged, and 10% had more fear after the course such that 90% were the same as or better than before’.

Generally speaking, a hypothesis test will output a p-value that can be used to make a decision about whether or not to reject the null hypothesis while controlling for the type I error rate. The p-value, however, conflates the size of the effect with our amount of clarity that it is inconsistent with the null (in essence, how much data the test had access to). An effect size generally tries to extract the N so as to isolate the magnitude of the effect. That line of reasoning illuminates the rationale behind dividing z by N. However, a major consideration with effect size measures is interpretability. Most commonly that consideration plays out in choosing between a raw effect size or a standardized effect size. (I suppose we could call z/N a standardized effect size, for what that’s worth.) At any rate, my guess is that reporting z/N won’t give people a quick, straightforward intuition into your effect.

There is another wrinkle, though. While you want an estimate of the size of the overall effect, people typically use the Wilcoxon signed rank test with data that are only ordinal. That is, where they don’t trust that the data can reliably indicate the magnitude of the shift within a student, but only that a shift occurred. That brings me to the proportion improved discussed above.

On the other hand, if you do trust that the values are intrinsically meaningful (e.g., you only used the signed rank test for its robustness to normality and outliers), you could just use a raw mean or median difference, or the standardized mean difference as a measure of effect.

Source : Link , Question Author : Community , Answer Author : gung – Reinstate Monica

Leave a Comment