# How do I calculate a confidence interval for the mean of a log-normal data set?

I’ve heard/seen in several places that you can transform the data set into something that is normal-distributed by taking the logarithm of each sample, calculate the confidence interval for the transformed data, and transform the confidence interval back using the inverse operation (e.g. raise 10 to the power of the lower and upper bounds, respectively, for $\log_{10}$).

However, I’m a bit suspicious of this method, simply because it doesn’t work for the mean itself: $10^{\operatorname{mean}(\log_{10}(X))} \ne \operatorname{mean}(X)$

What is the correct way to do this? If it doesn’t work for the mean itself, how can it possibly work for the confidence interval for the mean?

There are several ways for calculating confidence intervals for the mean of a lognormal distribution. I am going to present two methods: Bootstrap and Profile likelihood. I will also present a discussion on the Jeffreys prior.

Bootstrap

For the MLE

In this case, the MLE of $$(μ,σ)(\mu,\sigma)$$ for a sample $$(x1,...,xn)(x_1,...,x_n)$$ are

$$ˆμ=1nn∑j=1log(xj);ˆσ2=1nn∑j=1(log(xj)−ˆμ)2.\hat\mu= \dfrac{1}{n}\sum_{j=1}^n\log(x_j);\,\,\,\hat\sigma^2=\dfrac{1}{n}\sum_{j=1}^n(\log(x_j)-\hat\mu)^2.$$

Then, the MLE of the mean is $$ˆδ=exp(ˆμ+ˆσ2/2)\hat\delta=\exp(\hat\mu+\hat\sigma^2/2)$$. By resampling we can obtain a bootstrap sample of $$ˆδ\hat\delta$$ and, using this, we can calculate several bootstrap confidence intervals. The following R codes shows how to obtain these.

rm(list=ls())
library(boot)

set.seed(1)

# Simulated data
data0 = exp(rnorm(100))

# Statistic (MLE)

mle = function(dat){
m = mean(log(dat))
s = mean((log(dat)-m)^2)
return(exp(m+s/2))
}

# Bootstrap
boots.out = boot(data=data0, statistic=function(d, ind){mle(d[ind])}, R = 10000)
plot(density(boots.out$t)) # 4 types of Bootstrap confidence intervals boot.ci(boots.out, conf = 0.95, type = "all")  For the sample mean Now, considering the estimator $$˜δ=ˉx\tilde{\delta}=\bar{x}$$ instead of the MLE. Other type of estimators might be considered as well. rm(list=ls()) library(boot) set.seed(1) # Simulated data data0 = exp(rnorm(100)) # Statistic (MLE) samp.mean = function(dat) return(mean(dat)) # Bootstrap boots.out = boot(data=data0, statistic=function(d, ind){samp.mean(d[ind])}, R = 10000) plot(density(boots.out$t))

# 4 types of Bootstrap confidence intervals
boot.ci(boots.out, conf = 0.95, type = "all")


Profile likelihood

For the definition of likelihood and profile likelihood functions, see. Using the invariance property of the likelihood we can reparameterise as follows $$(μ,σ)→(δ,σ)(\mu,\sigma)\rightarrow(\delta,\sigma)$$, where $$δ=exp(μ+σ2/2)\delta=\exp(\mu+\sigma^2/2)$$ and then calculate numerically the profile likelihood of $$δ\delta$$.

$$Rp(δ)=supR_p(\delta)=\dfrac{\sup_{\sigma}{\mathcal L}(\delta,\sigma)}{\sup_{\delta,\sigma}{\mathcal L}(\delta,\sigma)}.$$

This function takes values in $$(0,1](0,1]$$; an interval of level $$0.1470.147$$ has an approximate confidence of $$95\%95\%$$. We are going to use this property for constructing a confidence interval for $$\delta\delta$$. The following R codes shows how to obtain this interval.

set.seed(1)

# Simulated data
data0 = exp(rnorm(100))

# Log likelihood
ll = function(mu,sigma) return( sum(log(dlnorm(data0,mu,sigma))))

# Profile likelihood
Rp = function(delta){
temp = function(sigma) return( sum(log(dlnorm(data0,log(delta)-0.5*sigma^2,sigma)) ))
max=exp(optimize(temp,c(0.25,1.5),maximum=TRUE)$objective -ll(mean(log(data0)),sqrt(mean((log(data0)-mean(log(data0)))^2)))) return(max) } vec = seq(1.2,2.5,0.001) rvec = lapply(vec,Rp) plot(vec,rvec,type="l") # Profile confidence intervals tr = function(delta) return(Rp(delta)-0.147) c(uniroot(tr,c(1.2,1.6))$$root,uniroot(tr,c(2,2.3))$$root)  $$\star\star$$ Bayesian In this section, an alternative algorithm, based on Metropolis-Hastings sampling and the use of the Jeffreys prior, for calculating a credibility interval for $$\delta\delta$$ is presented. Recall that the Jeffreys prior for $$(\mu,\sigma)(\mu,\sigma)$$ in a lognormal model is $$\pi(\mu,\sigma)\propto \sigma^{-2},\pi(\mu,\sigma)\propto \sigma^{-2},$$ and that this prior is invariant under reparameterisations. This prior is improper, but the posterior of the parameters is proper if the sample size $$n\geq 2n\geq 2$$. The following R code shows how to obtain a 95% credibility interval using this Bayesian model. library(mcmc) set.seed(1) # Simulated data data0 = exp(rnorm(100)) # Log posterior lp = function(par){ if(par[2]>0) return( sum(log(dlnorm(data0,par[1],par[2]))) - 2*log(par[2])) else return(-Inf) } # Metropolis-Hastings NMH = 260000 out = metrop(lp, scale = 0.175, initial = c(0.1,0.8), nbatch = NMH) #Acceptance rate out$acc

deltap = exp(  out$$batch[,1][seq(10000,NMH,25)] + 0.5*(out$$batch[,2][seq(10000,NMH,25)])^2  )

plot(density(deltap))

# 95% credibility interval
c(quantile(deltap,0.025),quantile(deltap,0.975))


Note that they are very similar.