Distribution for percentage data

I have a question about the correct distribution to use for creating a model with my data. I conducted a forest inventory with 50 plots, each plot measures 20m × 50m. For each plot, I estimated the percentage of tree canopy that shades the ground. Each plot has one value, in percent, for canopy cover. Percentages range from 0 to 0.95. I am making a model of percent tree canopy cover (Y variable), with a matrix of independent X variables based on satellite imagery and environmental data.

I am not sure if I should use a binomial distribution, since a binomial random variable is the sum of n independent trials (i.e., Bernoulli random variables). The percentage values are not the sum of trials; they are the actual percentages. Should I use gamma, even though it does not have an upper limit? Should I convert percentages to integer and use Poisson as counts? Should I just stick with Gaussian? I have not found many examples in the literature or in textbooks that try to model percentages in this way. Any hints or insights are appreciated.

Thank you for your answers. In fact, the beta distribution is exactly what I need and is thoroughly discussed in this article:

The following article discusses a good way to transform a beta-distributed response variable when it includes true 0’s and/or 1’s in the range of percentages:


You are right that the binomial distribution is for discrete proportions that arise from the number of ‘successes’ from a finite number of Bernoulli trials, and that this makes the distribution inappropriate for your data. You should use the Gamma distribution divided by the sum of that Gamma plus another Gamma. That is, you should use the beta distribution to model continuous proportions.

I have an example of beta regression in my answer here: Remove effect of factor on continuous proportion data using regression in R.

@DimitriyV.Masterov raises the good point that you mention your data have 0‘s, but the beta distribution is only supported on (0, 1). This prompts the question of what should be done with such values. Some ideas can be gleaned from this excellent CV thread: How small a quantity should be added to x to avoid taking the log of 0?

Source : Link , Question Author : Ron , Answer Author : Community

Leave a Comment