What is the distribution of the value in a sample closest to a given value?

For a fixed value $m$, draw $k$ samples from a normal distribution, and select one, say $X$, which is closest to $m$. Then what distribution will $X$ follow? It is kind of similar to an extreme value distribution but I can not figure it out.

Answer

Let’s solve this for all distributions, normal or not.

To this end, let the distribution function be $F$ and let $\epsilon \ge 0$ be any possible distance to $m.$ The event “$X$ is within distance $\epsilon$ of $m$” is the interval $X\in[m-\epsilon, m+\epsilon].$ According to the definition of $F,$ this can be expressed as

$$\Pr(|X-m|\le \epsilon) = F(m+\epsilon) – F(m-\epsilon) + \Pr(X=m-\epsilon).$$
(For a Normal distribution, or any continuous distribution, that last term is zero and can be ignored.)

The chance this does not occur is its complement,

$$\Pr(|X-m|\gt \epsilon) = 1- \Pr(|X-m|\le \epsilon).$$

For a random sample of $n$ independent values, these probabilities multiply (that’s the definition of independence). Consequently, the chance that all values in the sample are greater than $\epsilon$ from $m$ is

$$\Pr(|X_i-m|\gt \epsilon\ \forall i) = \left[1- \Pr(|X-m|\le \epsilon)\right]^n.$$

Its complement therefore is the chance that at least one of the $X_i$ is within distance $\epsilon$ of $m.$ This is precisely the distribution function of the nearest distance. Writing $E = \min|X_i-m|$ for that distance, we have found

$$F_E(\epsilon) = \Pr(E\le \epsilon) = 1 – \left[1- \Pr(|X-m|\le \epsilon)\right]^n.$$

This is a thorough and fully general answer. When $F$ is continuous at $m\pm\epsilon$ (with density function $f$) though, we can (a) neglect that last probability term and (b) differentiate the expression to obtain a density for $E,$

$$f_E(\epsilon) = \frac{\mathrm d}{\mathrm{d}\epsilon} F_E(\epsilon) = n\left[F(m+\epsilon) – F(m-\epsilon)\right]^{n-1} \left(f(m+\epsilon) + f(m-\epsilon)\right).$$

Here are some plots of $f_E$ for various sample sizes from the standard Normal distribution.

Figure

It all makes sense: as you look from left to right, the sample size increases and therefore the chance of being close to any given $m$ increases. As $m$ increases from $0$ (the mode) to $4$ (far out into the right tail), the chance of being close to $m$ remains small, but the typical nearest distance to $m$ shrinks.

In a similar fashion you can write the (more complicated) formula for the signed distance between the nearest $X$ and $m.$ Adding $m$ to this will produce a distribution of the nearest $X,$ if that’s what you want.


This is the R code used to generate the figure. It implements $F_E$ as pnormclosest and $f_E$ as dnormclosest. They are readily modified to handle any distribution $F$ by replacing pnorm and dnorm by its distribution and density functions, respectively.

pnormclosest <- function(x, m, n=1, mu=0, sigma=1) {
  1 - (pnorm(m-x, mu, sigma) + pnorm(m+x, mu, sigma, lower.tail=FALSE))^n
}
dnormclosest <- function(x, m, n=1, mu=0, sigma=1) {
  n * (pnorm(m-x, mu, sigma) + pnorm(m+x, mu, sigma, lower.tail=FALSE))^(n-1) *
    (dnorm(m-x, mu, sigma) + dnorm(m+x, mu, sigma))
}

ns <- c(1, 2, 20, 100)
ms <- c(0, 1, 2, 4)
par(mfrow = c(1, length(ns)))
for (n in ns) {
  for (m in ms) curve(dnormclosest(x, m, n), 0, 3, ylim=c(0,2), add=m != 0,
                      lwd=2, lty=abs(m)+1, col=hsv(abs(m)/(max(abs(ms))+1), .9, .8),
                      xlab="Distance", ylab="Density",
                      main=paste0("Sample size ", n))
  legend("topright", bty="n", title="m", legend=ms, lty=abs(ms)+1, lwd=2, 
         col=hsv(abs(ms)/(max(abs(ms))+1), .9, .8))
}
par(mfrow=c(1,1))

Attribution
Source : Link , Question Author : Rafa Zhang , Answer Author : whuber

Leave a Comment