# Weibull distribution parameters kk and cc for wind speed data

Hi can the same be shown to obtain shape and scale parameter for modified maximum likelihood method

Because @zaynah posted in the comments that the data are thought to follow a Weibull distribution, I’m gonna provide a short tutorial on how to estimate the parameters of such a distribution using MLE (Maximum likelihood estimation). There is a similar post about wind speeds and Weibull distribution on the site.

1. Download and install R, it’s free
2. Optional: Download and install RStudio, which is a great IDE for R providing a ton of useful functions such as syntax highlighting and more.
3. Install the packages MASS and car by typing: install.packages(c("MASS", "car")). Load them by typing: library(MASS) and library(car).
4. Import your data into R. If you have your data in Excel, for example, save them as delimited text file (.txt) and import them in R with read.table.
5. Use the function fitdistr to calculate the maximum likelihood estimates of your weibull distribution: fitdistr(my.data, densfun="weibull", lower = 0). To see a fully worked out example, see the link at the bottom of the answer.
6. Make a QQ-Plot to compare your data with a Weibull distribution with the scale and shape parameters estimated at point 5: qqPlot(my.data, distribution="weibull", shape=, scale=)

The tutorial of Vito Ricci on fitting distribution with R is a good starting point on the matter. And there are numerous posts on this site on the subject (see this post too).

To see a fully worked out example of how to use fitdistr, have a look at this post.

Let’s look at an example in R:

# Load packages

library(MASS)
library(car)

# First, we generate 1000 random numbers from a Weibull distribution with
# scale = 1 and shape = 1.5

rw <- rweibull(1000, scale=1, shape=1.5)

# We can calculate a kernel density estimation to inspect the distribution
# Because the Weibull distribution has support [0,+Infinity), we are truncate
# the density at 0

par(bg="white", las=1, cex=1.1)
plot(density(rw, bw=0.5, cut=0), las=1, lwd=2,
xlim=c(0,5),col="steelblue")


# Now, we can use fitdistr to calculate the parameters by MLE
# The option "lower = 0" is added because the parameters of the Weibull distribution need to be >= 0

fitdistr(rw, densfun="weibull", lower = 0)

shape        scale
1.56788999   1.01431852
(0.03891863) (0.02153039)


The maximum likelihood estimates are close to those we arbitrarily set in the generation of the random numbers. Let’s compare our data using a QQ-Plot with a hypothetical Weibull distribution with the parameters that we’ve estimated with fitdistr:

qqPlot(rw, distribution="weibull", scale=1.014, shape=1.568, las=1, pch=19)


The points are nicely aligned on the line and mostly within the 95%-confidence envelope. We would conclude that our data are compatible with a Weibull distribution. This was expected, of course, as we’ve sampled our values from a Weibull distribution.

## Estimating the $k$ (shape) and $c$ (scale) of a Weibull distribution without MLE

This paper lists five methods to estimate the parameters of a Weibull distribution for wind speeds. I’m gonna explain three of them here.

## From means and standard deviation

The shape parameter $k$ is estimated as:

and the scale parameter $c$ is estimated as:

with $\hat{v}$ is the mean wind speed and $\hat{\sigma}$ the standard deviation and $\Gamma$ is the Gamma function.

## Least-squares fit to observed distribution

If the observed wind speeds are divided into $n$ speed interval $0-V_{1},V_{1}-V_{2},\ldots, V_{n-1}-V_{n}$, having frequencies of occurrence $f_{1}, f_{2},\ldots,f_{n}$ and cumulative frequencies $p_{1}=f_{1}, p_{2}=f_{1}+f_{2}, \ldots, p_{n}=p_{n-1}+f_{n}$, then you can fit a linear regression of the form $y=a+bx$ to the values

The Weibull parameters are related to the linear coefficients $a$ and $b$ by

## Median and quartile wind speeds

If you don’t have the complete observed wind speeds but the median $V_{m}$ and quartiles $V_{0.25}$ and $V_{0.75}$ $\left[p(V\leq V_{0.25})=0.25, p(V\leq V_{0.75})=0.75\right]$, then $c$ and $k$ can be computed by the relations

## Comparison of the four methods

Here is an example in R comparing the four methods:

library(MASS)  # for "fitdistr"

set.seed(123)
#-----------------------------------------------------------------------------
# Generate 10000 random numbers from a Weibull distribution
# with shape = 1.5 and scale = 1
#-----------------------------------------------------------------------------

rw <- rweibull(10000, shape=1.5, scale=1)

#-----------------------------------------------------------------------------
# 1. Estimate k and c by MLE
#-----------------------------------------------------------------------------

fitdistr(rw, densfun="weibull", lower = 0)
shape         scale
1.515380298   1.005562356

#-----------------------------------------------------------------------------
# 2. Estimate k and c using the leas square fit
#-----------------------------------------------------------------------------

n <- 100 # number of bins
breaks <- seq(0, max(rw), length.out=n)

freqs <- as.vector(prop.table(table(cut(rw, breaks = breaks))))
cum.freqs <- c(0, cumsum(freqs))

xi <- log(breaks)
yi <- log(-log(1-cum.freqs))

# Fit the linear regression
least.squares <- lm(yi[is.finite(yi) & is.finite(xi)]~xi[is.finite(yi) & is.finite(xi)])
lin.mod.coef <- coefficients(least.squares)

k <- lin.mod.coef[2]
k
1.515115
c <- exp(-lin.mod.coef[1]/lin.mod.coef[2])
c
1.006004

#-----------------------------------------------------------------------------
# 3. Estimate k and c using the median and quartiles
#-----------------------------------------------------------------------------

med <- median(rw)
quarts <- quantile(rw, c(0.25, 0.75))

k <- log(log(0.25)/log(0.75))/log(quarts[2]/quarts[1])
k
1.537766
c <- med/log(2)^(1/k)
c
1.004434

#-----------------------------------------------------------------------------
# 4. Estimate k and c using mean and standard deviation.
#-----------------------------------------------------------------------------

k <- (sd(rw)/mean(rw))^(-1.086)
c <- mean(rw)/(gamma(1+1/k))
k
1.535481
c
1.006938


All methods yield very similar results. The maximum likelihood approach has the advantage that the standard errors of the Weibull parameters are directly given.

## Using bootstrap to add pointwise confidence intervals to the PDF or CDF

We can use a the non-parametric bootstrap to construct pointwise confidence intervals around the PDF and CDF of the estimated Weibull distribution. Here’s an R script:

#-----------------------------------------------------------------------------
# 5. Bootstrapping the pointwise confidence intervals
#-----------------------------------------------------------------------------

set.seed(123)

rw.small <- rweibull(100,shape=1.5, scale=1)

xs <- seq(0, 5, len=500)

boot.pdf <- sapply(1:1000, function(i) {
xi <- sample(rw.small, size=length(rw.small), replace=TRUE)
MLE.est <- suppressWarnings(fitdistr(xi, densfun="weibull", lower = 0))
dweibull(xs, shape=as.numeric(MLE.est[[1]][13]), scale=as.numeric(MLE.est[[1]][14]))
}
)

boot.cdf <- sapply(1:1000, function(i) {
xi <- sample(rw.small, size=length(rw.small), replace=TRUE)
MLE.est <- suppressWarnings(fitdistr(xi, densfun="weibull", lower = 0))
pweibull(xs, shape=as.numeric(MLE.est[[1]][15]), scale=as.numeric(MLE.est[[1]][16]))
}
)

#-----------------------------------------------------------------------------
# Plot PDF
#-----------------------------------------------------------------------------

par(bg="white", las=1, cex=1.2)
plot(xs, boot.pdf[, 1], type="l", col=rgb(.6, .6, .6, .1), ylim=range(boot.pdf),
xlab="x", ylab="Probability density")
for(i in 2:ncol(boot.pdf)) lines(xs, boot.pdf[, i], col=rgb(.6, .6, .6, .1))

quants <- apply(boot.pdf, 1, quantile, c(0.025, 0.5, 0.975))
min.point <- apply(boot.pdf, 1, min, na.rm=TRUE)
max.point <- apply(boot.pdf, 1, max, na.rm=TRUE)
lines(xs, quants[1, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[3, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[2, ], col="darkred", lwd=2)
#lines(xs, min.point, col="purple")
#lines(xs, max.point, col="purple")


#-----------------------------------------------------------------------------
# Plot CDF
#-----------------------------------------------------------------------------

par(bg="white", las=1, cex=1.2)
plot(xs, boot.cdf[, 1], type="l", col=rgb(.6, .6, .6, .1), ylim=range(boot.cdf),
xlab="x", ylab="F(x)")
for(i in 2:ncol(boot.cdf)) lines(xs, boot.cdf[, i], col=rgb(.6, .6, .6, .1))