# How to fit a Weibull distribution to input data containing zeroes?

I’m trying to reproduce an existing prediction algorithm, handed down by a retired researcher. The first step is to fit some observed data to a Weibull distribution, to obtain a shape and scale which will be used for predicting future values. I’m using R to do this. Here’s an example of my code:

x<-c(23,19,37,38,40,36,172,48,113,90,54,104,90,54,157,51,77,78,144,34,29,45,16,15,37,218,170,44,121)
f<-fitdistr(x, 'weibull')


This works fine unless there are any zeroes in the input array, which causes it to fail completely. The same thing happens in SAS. As I understand it, this is because one of the steps in calculating the Weibull distribution is taking the natural log, which is undefined for 0. Is there a reasonable way to work around this?

The best I’ve found so far is to add 1 to all of my input values, fit the curve, and then subtract one from my predicted values (“shift” the curve up and then back down by 1). This fits the previously predicted data fairly well, but seems like it must be a wrong way of doing so.

edit: The values in the input array are observed, real-world data (the number of occurrences of something) for a range of years. So in some years the number of occurrences was zero. Whether it’s the best way or not (I agree that it may not be), the original algorithm author claims to have used the Weibull distribution, and I have to try to replicate their process.

(As others have pointed out, a Weibull distribution is not likely to be an appropriate approximation when the data are integers only. The following is intended just to help you determine what that previous researcher did, rightly or wrongly.)

There are several alternative methods that are not affected by zeros in the data, such as using various method-of moments estimators. These typically require numerical solution of equations involving the gamma function, because the moments of the Weibull distribution are given in terms of this function. I’m not familiar with R, but here’s a Sage program that illustrates one of the simpler methods — maybe it can be adapted to R? (You can read about this and other such methods in, e.g., “The Weibull distribution: a handbook” by Horst Rinne, p. 455ff — however, there is a typo in his eq.12.4b, as the ‘-1’ is redundant).

"""
Blischke-Scheuer method-of-moments estimation of (a,b)
for the Weibull distribution F(t) = 1 - exp(-(t/a)^b)
"""

x = [23,19,37,38,40,36,172,48,113,90,54,104,90,54,157,
51,77,78,144,34,29,45,16,15,37,218,170,44,121]
xbar = mean(x)
varx = variance(x)
var("b"); f(b) = gamma(1+2/b)/gamma(1+1/b)^2 - 1 - varx/xbar^2
bhat = find_root(f, 0.01, 100)
ahat = xbar/gamma(1+1/bhat)
print "Estimates: (ahat, bhat) = ", (ahat, bhat)


This produced the output

Estimates: (ahat, bhat) =  (81.316784310814455, 1.3811394719075942)


If the above data are modified (just for illustration) by replacing the three smallest values by $0$, i.e.

x = [23,0,37,38,40,36,172,48,113,90,54,104,90,54,157,
51,77,78,144,34,29,45,0,0,37,218,170,44,121]


then the same procedure produces the output

Estimates: (ahat, bhat) =  (78.479354097488923, 1.2938352346035282)


EDIT: I just installed R to give it a try. At the risk of making this answer over-long, for anyone interested here’s my R-code for the Blischke-Scheuer method:

fit_weibull <- function(x)
{
xbar <- mean(x)
varx <- var(x)
f <- function(b){return(gamma(1+2/b)/gamma(1+1/b)^2 - 1 - varx/xbar^2)}
bhat <- uniroot(f,c(0.02,50))\$root
ahat <- xbar/gamma(1+1/bhat)
return(c(ahat,bhat))
}


This reproduces (to five significant digits) the two Sage examples above:

x <- c(23,19,37,38,40,36,172,48,113,90,54,104,90,54,157,
51,77,78,144,34,29,45,16,15,37,218,170,44,121)
fit_weibull(x)
 81.316840  1.381145

x <- c(23,0,37,38,40,36,172,48,113,90,54,104,90,54,157,
51,77,78,144,34,29,45,0,0,37,218,170,44,121)
fit_weibull(x)
 78.479180  1.293821