Asymptotic distribution of censored samples from exp(λ)\exp(\lambda)

Let X(1),,X(n) be the order statistic of an i.i.d. sample of size n from exp(λ). Suppose the data is censored so we see only the top (1p)×100 percent of the data, that is X(pn),X(pn+1),,X(n). Put m=pn, what is the asymptotic distribution of
(X(m),ni=m+1X(i)(nm))?

This is somewhat related to this question and this and also marginally to this question.

Any help would be appreciated. I tried different approaches but was not able to progress much.

Answer

Since λ is just a scale factor, without loss of generality choose units of measurement that make λ=1, making the underlying distribution function F(x)=1exp(x) with density f(x)=exp(x).

From considerations paralleling those at Central limit theorem for sample medians, X(m) is asymptotically Normal with mean F1(p)=log(1p) and variance

Var(X(m))=p(1p)nf(log(1p))2=pn(1p).

Due to the memoryless property of the exponential distribution, the variables (X(m+1),,X(n)) act like the order statistics of a random sample of nm draws from F, to which X(m) has been added. Writing

Y=1nmni=m+1X(i)

for their mean, it is immediate that the mean of Y is the mean of F (equal to 1) and the variance of Y is 1/(nm) times the variance of F (also equal to 1). The Central Limit Theorem implies the standardized Y is asymptotically Standard Normal. Moreover, because Y is conditionally independent of X(m), we simultaneously have the standardized version of X(m) becoming asymptotically Standard Normal and uncorrelated with Y. That is,

(X(m)+log(1p)p/(n(1p)),YX(m)1nm)

asymptotically has a bivariate Standard Normal distribution.


The graphics report on simulated data for samples of n=1000 (500 iterations) and p=0.95. A trace of positive skewness remains, but the approach to bivariate normality is evident in the lack of relationship between YX(m) and X(m) and the closeness of the histograms to the Standard Normal density (shown in red dots).
Figure

The covariance matrix of the standardized values (as in formula (1)) for this simulation was (0.9670.0210.0211.010), comfortably close to the unit matrix which it approximates.

The R code that produced these graphics is readily modified to study other values of n, p, and simulation size.

n <- 1e3
p <- 0.95
n.sim <- 5e3
#
# Perform the simulation.
# X_m will be in the first column and Y in the second.
#
set.seed(17)
m <- floor(p * n)
X <- apply(matrix(rexp(n.sim * n), nrow = n), 2, sort)
X <- cbind(X[m, ], colMeans(X[(m+1):n, , drop=FALSE]))
#
# Display the results.
#
par(mfrow=c(2,2))

plot(X[,1], X[,2], pch=16, col="#00000020", 
     xlab=expression(X[(m)]), ylab="Y",
     main="Y vs X", sub=paste("n =", n, "and p =", signif(p, 2)))

plot(X[,1], X[,2]-X[,1], pch=16, col="#00000020", 
     xlab=expression(X[(m)]), ylab=expression(Y - X[(m)]),
     main="Y-X vs X", sub="Loess smooth shown")
lines(lowess(X[,2]-X[,1] ~ X[,1]), col="Red", lwd=3, lty=1)

x <- (X[,1] + log(1-p))  / sqrt(p/(n*(1-p)))
hist(x, main="Standardized X", freq=FALSE, xlab="Value")
curve(dnorm(x), add=TRUE, col="Red", lty=3, lwd=2)

y <- (X[,2] - X[,1] - 1) * sqrt(n-m)
hist(y, main="Standardized Y-X", freq=FALSE, xlab="Value")
curve(dnorm(x), add=TRUE, col="Red", lty=3, lwd=2)
par(mfrow=c(1,1))

round(var(cbind(x,y)), 3) # Should be close to the unit matrix

Attribution
Source : Link , Question Author : them , Answer Author : whuber

Leave a Comment