Proportion, ratio, and percentage data is very common in ecology (eg, % of flowers pollinated, male:female sex ratio, % mortality in response to a treatment, % of leaf eaten by an herbivore). An article was recently published by some applied statisticians in the journal Ecology titled “The arcsine is asinine: the analysis of proportions in ecology.” They noted that the arcsine transformation has been promoted by long-running texts like Zar’s “Biostatistical Analysis” and Sokal and Rohlf’s “Biometry” (both in their 3rd or 4th eds.) but this technique has been outmoded by generalized linear models and better computing:
The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. […] For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues.
I was wondering how common proportion data are in other fields (psych? medicine?)? Is the arcsine still commonly used in other fields or are ecologists exceptional in their use of this (or other) outmoded or less than optimal techniques? Have there been papers in other fields that highlight the need to use more advanced techniques?
I teach it to public health students for two reasons:
one of my colleagues teach it (in the introduction course) as magic recipe, I show them the Delta method and how it is derived;
I think the Delta method and variance stabilizing transformations are not asinine and can be useful. The confidence interval computed using arcsin transform with correction of continuity is not perfect but behaves reasonnably well, and for small samples it is much much better¹ than the Wald procedure, which is still widely used.
As John for psychology and neuroscience, I think many people in epidemiology don’t even care, they just use linear models in a push-button way.
¹ Pires, Amado, 2008. Interval estimators for a binomial proportion.