Consider a beta distribution for a given set of ratings in [0,1]. After having calculated the mean:

$$

\mu = \frac{\alpha}{\alpha+\beta}

$$Is there a way to provide a confidence interval around this mean?

**Answer**

While there are specific methods for calculating confidence intervals for the parameters in a beta distribution, I’ll describe a few general methods, that can be used for (almost) *all sorts of distributions*, including the beta distribution, and are easily implemented in R.

# Profile likelihood confidence intervals

Let’s begin with maximum likelihood estimation with corresponding profile likelihood confidence intervals. First we need some sample data:

```
# Sample size
n = 10
# Parameters of the beta distribution
alpha = 10
beta = 1.4
# Simulate some data
set.seed(1)
x = rbeta(n, alpha, beta)
# Note that the distribution is not symmetrical
curve(dbeta(x,alpha,beta))
```

The real/theoretical mean is

```
> alpha/(alpha+beta)
0.877193
```

Now we have to create a function for calculating the negative log likelihood function for a sample from the beta distribution, with the mean as one of the parameters. We can use the `dbeta()`

function, but since this doesn’t use a parametrisation involving the mean, we have have to express its parameters (*α* and *β*) as a function of the mean and some other parameter (like the standard deviation):

```
# Negative log likelihood for the beta distribution
nloglikbeta = function(mu, sig) {
alpha = mu^2*(1-mu)/sig^2-mu
beta = alpha*(1/mu-1)
-sum(dbeta(x, alpha, beta, log=TRUE))
}
```

To find the maximum likelihood estimate, we can use the `mle()`

function in the `stats4`

library:

```
library(stats4)
est = mle(nloglikbeta, start=list(mu=mean(x), sig=sd(x)))
```

Just ignore the warnings for now. They’re caused by the optimisation algorithms trying invalid values for the parameters, giving negative values for *α* and/or *β*. (To avoid the warning, you can add a `lower`

argument and change the optimisation `method`

used.)

Now we have both estimates and confidence intervals for our two parameters:

```
> est
Call:
mle(minuslogl = nloglikbeta, start = list(mu = mean(x), sig = sd(x)))
Coefficients:
mu sig
0.87304148 0.07129112
> confint(est)
Profiling...
2.5 % 97.5 %
mu 0.81336555 0.9120350
sig 0.04679421 0.1276783
```

Note that, as expected, the confidence intervals are *not* symmetrical:

```
par(mfrow=c(1,2))
plot(profile(est)) # Profile likelihood plot
```

(The second-outer magenta lines show the 95% confidence interval.)

Also note that even with just 10 observations, we get very good estimates (a narrow confidence interval).

As an alternative to `mle()`

, you can use the `fitdistr()`

function from the `MASS`

package. This too calculates the maximum likelihood estimator, and has the advantage that you only need to supply the density, not the negative log likelihood, but doesn’t give you profile likelihood confidence intervals, only asymptotic (symmetrical) confidence intervals.

A better option is `mle2()`

(and related functions) from the `bbmle`

package, which is somewhat more flexible and powerful than `mle()`

, and gives slightly nicer plots.

# Bootstrap confidence intervals

Another option is to use the bootstrap. It’s extremely easy to use in R, and you don’t even have to supply a density function:

```
> library(simpleboot)
> x.boot = one.boot(x, mean, R=10^4)
> hist(x.boot) # Looks good
> boot.ci(x.boot, type="bca") # Confidence interval
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = x.boot, type = "bca")
Intervals :
Level BCa
95% ( 0.8246, 0.9132 )
Calculations and Intervals on Original Scale
```

The bootstrap has the added advantage that it works even if your data doesn’t come from a beta distribution.

# Asymptotic confidence intervals

For confidence intervals on the mean, let’s not forget the good old asymptotic confidence intervals based on the central limit theorem (and the *t*-distribution). As long as we have either a large sample size (so the CLT applies and the distribution of the sample mean is approximately normal) or large values of both *α* and *β* (so that the beta distribution itself is approximately normal), it works well. Here we have neither, but the confidence interval still isn’t too bad:

```
> t.test(x)$conf.int
[1] 0.8190565 0.9268349
```

For just slightly larges values of *n* (and not too extreme values of the two parameters), the asymptotic confidence interval works exceedingly well.

**Attribution***Source : Link , Question Author : dominic , Answer Author : Karl Ove Hufthammer*