Weibull distribution parameters kk and cc for wind speed data

Hi can the same be shown to obtain shape and scale parameter for modified maximum likelihood method

Because @zaynah posted in the comments that the data are thought to follow a Weibull distribution, I’m gonna provide a short tutorial on how to estimate the parameters of such a distribution using MLE (Maximum likelihood estimation). There is a similar post about wind speeds and Weibull distribution on the site.

1. Download and install R, it’s free
2. Optional: Download and install RStudio, which is a great IDE for R providing a ton of useful functions such as syntax highlighting and more.
3. Install the packages MASS and car by typing: install.packages(c("MASS", "car")). Load them by typing: library(MASS) and library(car).
4. Import your data into R. If you have your data in Excel, for example, save them as delimited text file (.txt) and import them in R with read.table.
5. Use the function fitdistr to calculate the maximum likelihood estimates of your weibull distribution: fitdistr(my.data, densfun="weibull", lower = 0). To see a fully worked out example, see the link at the bottom of the answer.
6. Make a QQ-Plot to compare your data with a Weibull distribution with the scale and shape parameters estimated at point 5: qqPlot(my.data, distribution="weibull", shape=, scale=)

The tutorial of Vito Ricci on fitting distribution with R is a good starting point on the matter. And there are numerous posts on this site on the subject (see this post too).

To see a fully worked out example of how to use fitdistr, have a look at this post.

Let’s look at an example in R:

# Load packages

library(MASS)
library(car)

# First, we generate 1000 random numbers from a Weibull distribution with
# scale = 1 and shape = 1.5

rw <- rweibull(1000, scale=1, shape=1.5)

# We can calculate a kernel density estimation to inspect the distribution
# Because the Weibull distribution has support [0,+Infinity), we are truncate
# the density at 0

par(bg="white", las=1, cex=1.1)
plot(density(rw, bw=0.5, cut=0), las=1, lwd=2,
xlim=c(0,5),col="steelblue")


# Now, we can use fitdistr to calculate the parameters by MLE
# The option "lower = 0" is added because the parameters of the Weibull distribution need to be >= 0

fitdistr(rw, densfun="weibull", lower = 0)

shape        scale
1.56788999   1.01431852
(0.03891863) (0.02153039)


The maximum likelihood estimates are close to those we arbitrarily set in the generation of the random numbers. Let’s compare our data using a QQ-Plot with a hypothetical Weibull distribution with the parameters that we’ve estimated with fitdistr:

qqPlot(rw, distribution="weibull", scale=1.014, shape=1.568, las=1, pch=19)


The points are nicely aligned on the line and mostly within the 95%-confidence envelope. We would conclude that our data are compatible with a Weibull distribution. This was expected, of course, as we’ve sampled our values from a Weibull distribution.

Estimating the $k$ (shape) and $c$ (scale) of a Weibull distribution without MLE

This paper lists five methods to estimate the parameters of a Weibull distribution for wind speeds. I’m gonna explain three of them here.

From means and standard deviation

The shape parameter $k$ is estimated as:

and the scale parameter $c$ is estimated as:

with $\hat{v}$ is the mean wind speed and $\hat{\sigma}$ the standard deviation and $\Gamma$ is the Gamma function.

Least-squares fit to observed distribution

If the observed wind speeds are divided into $n$ speed interval $0-V_{1},V_{1}-V_{2},\ldots, V_{n-1}-V_{n}$, having frequencies of occurrence $f_{1}, f_{2},\ldots,f_{n}$ and cumulative frequencies $p_{1}=f_{1}, p_{2}=f_{1}+f_{2}, \ldots, p_{n}=p_{n-1}+f_{n}$, then you can fit a linear regression of the form $y=a+bx$ to the values

The Weibull parameters are related to the linear coefficients $a$ and $b$ by

Median and quartile wind speeds

If you don’t have the complete observed wind speeds but the median $V_{m}$ and quartiles $V_{0.25}$ and $V_{0.75}$ $\left[p(V\leq V_{0.25})=0.25, p(V\leq V_{0.75})=0.75\right]$, then $c$ and $k$ can be computed by the relations

Comparison of the four methods

Here is an example in R comparing the four methods:

library(MASS)  # for "fitdistr"

set.seed(123)
#-----------------------------------------------------------------------------
# Generate 10000 random numbers from a Weibull distribution
# with shape = 1.5 and scale = 1
#-----------------------------------------------------------------------------

rw <- rweibull(10000, shape=1.5, scale=1)

#-----------------------------------------------------------------------------
# 1. Estimate k and c by MLE
#-----------------------------------------------------------------------------

fitdistr(rw, densfun="weibull", lower = 0)
shape         scale
1.515380298   1.005562356

#-----------------------------------------------------------------------------
# 2. Estimate k and c using the leas square fit
#-----------------------------------------------------------------------------

n <- 100 # number of bins
breaks <- seq(0, max(rw), length.out=n)

freqs <- as.vector(prop.table(table(cut(rw, breaks = breaks))))
cum.freqs <- c(0, cumsum(freqs))

xi <- log(breaks)
yi <- log(-log(1-cum.freqs))

# Fit the linear regression
least.squares <- lm(yi[is.finite(yi) & is.finite(xi)]~xi[is.finite(yi) & is.finite(xi)])
lin.mod.coef <- coefficients(least.squares)

k <- lin.mod.coef[2]
k
1.515115
c <- exp(-lin.mod.coef[1]/lin.mod.coef[2])
c
1.006004

#-----------------------------------------------------------------------------
# 3. Estimate k and c using the median and quartiles
#-----------------------------------------------------------------------------

med <- median(rw)
quarts <- quantile(rw, c(0.25, 0.75))

k <- log(log(0.25)/log(0.75))/log(quarts[2]/quarts[1])
k
1.537766
c <- med/log(2)^(1/k)
c
1.004434

#-----------------------------------------------------------------------------
# 4. Estimate k and c using mean and standard deviation.
#-----------------------------------------------------------------------------

k <- (sd(rw)/mean(rw))^(-1.086)
c <- mean(rw)/(gamma(1+1/k))
k
1.535481
c
1.006938


All methods yield very similar results. The maximum likelihood approach has the advantage that the standard errors of the Weibull parameters are directly given.

Using bootstrap to add pointwise confidence intervals to the PDF or CDF

We can use a the non-parametric bootstrap to construct pointwise confidence intervals around the PDF and CDF of the estimated Weibull distribution. Here’s an R script:

#-----------------------------------------------------------------------------
# 5. Bootstrapping the pointwise confidence intervals
#-----------------------------------------------------------------------------

set.seed(123)

rw.small <- rweibull(100,shape=1.5, scale=1)

xs <- seq(0, 5, len=500)

boot.pdf <- sapply(1:1000, function(i) {
xi <- sample(rw.small, size=length(rw.small), replace=TRUE)
MLE.est <- suppressWarnings(fitdistr(xi, densfun="weibull", lower = 0))
dweibull(xs, shape=as.numeric(MLE.est[[1]][13]), scale=as.numeric(MLE.est[[1]][14]))
}
)

boot.cdf <- sapply(1:1000, function(i) {
xi <- sample(rw.small, size=length(rw.small), replace=TRUE)
MLE.est <- suppressWarnings(fitdistr(xi, densfun="weibull", lower = 0))
pweibull(xs, shape=as.numeric(MLE.est[[1]][15]), scale=as.numeric(MLE.est[[1]][16]))
}
)

#-----------------------------------------------------------------------------
# Plot PDF
#-----------------------------------------------------------------------------

par(bg="white", las=1, cex=1.2)
plot(xs, boot.pdf[, 1], type="l", col=rgb(.6, .6, .6, .1), ylim=range(boot.pdf),
xlab="x", ylab="Probability density")
for(i in 2:ncol(boot.pdf)) lines(xs, boot.pdf[, i], col=rgb(.6, .6, .6, .1))

quants <- apply(boot.pdf, 1, quantile, c(0.025, 0.5, 0.975))
min.point <- apply(boot.pdf, 1, min, na.rm=TRUE)
max.point <- apply(boot.pdf, 1, max, na.rm=TRUE)
lines(xs, quants[1, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[3, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[2, ], col="darkred", lwd=2)
#lines(xs, min.point, col="purple")
#lines(xs, max.point, col="purple")


#-----------------------------------------------------------------------------
# Plot CDF
#-----------------------------------------------------------------------------

par(bg="white", las=1, cex=1.2)
plot(xs, boot.cdf[, 1], type="l", col=rgb(.6, .6, .6, .1), ylim=range(boot.cdf),
xlab="x", ylab="F(x)")
for(i in 2:ncol(boot.cdf)) lines(xs, boot.cdf[, i], col=rgb(.6, .6, .6, .1))