Zero inflated distributions, what are they really?

I am struggling to understand zero inflated distributions. What are they? What’s the point?

If I have data with many zeroes, then I could fit a logistic regression first calculate the probability of zeroes, and then I could remove all the zeroes, and then fit a regular regression using my choice of distribution (poisson e.g.).

Then somebody told me “hey, use a zero inflated distribution”, but looking it up, it does not seem to do anything differently than what I suggested above? It has a regular parameter $\mu$, and then another parameter $p$ to model the probability of zero? It just does both things at the same time no?

Answer

fit a logistic regression first calculate the probability of zeroes, and then I could remove all the zeroes, and then fit a regular regression using my choice of distribution (poisson e.g.)

You’re absolutely right. This is one way to fit a zero-inflated model (or as Achim Zeileis points out in the comments, this is strictly a “hurdle model”, which one could view as a special case of a zero-inflated model).

The difference between the procedure you described and an “all-in-one” zero-inflated model is error propagation. Like all other two-step procedures in statistics, the overall uncertainty of your predictions in step 2 won’t take into account the uncertainty as to whether the prediction should be 0 or not.

Sometimes this is a necessary evil. Fortunately, it’s not necessary in this case. In R, you can use pscl::hurdle() or fitdistrplus::fitdist().

Attribution
Source : Link , Question Author : Calro , Answer Author : shadowtalker

Leave a Comment