# What is the distribution of the value in a sample closest to a given value?

For a fixed value $$m$$, draw $$k$$ samples from a normal distribution, and select one, say $$X$$, which is closest to $$m$$. Then what distribution will $$X$$ follow? It is kind of similar to an extreme value distribution but I can not figure it out.

Let’s solve this for all distributions, normal or not.

To this end, let the distribution function be $$F$$ and let $$\epsilon \ge 0$$ be any possible distance to $$m.$$ The event “$$X$$ is within distance $$\epsilon$$ of $$m$$” is the interval $$X\in[m-\epsilon, m+\epsilon].$$ According to the definition of $$F,$$ this can be expressed as

$$\Pr(|X-m|\le \epsilon) = F(m+\epsilon) – F(m-\epsilon) + \Pr(X=m-\epsilon).$$
(For a Normal distribution, or any continuous distribution, that last term is zero and can be ignored.)

The chance this does not occur is its complement,

$$\Pr(|X-m|\gt \epsilon) = 1- \Pr(|X-m|\le \epsilon).$$

For a random sample of $$n$$ independent values, these probabilities multiply (that’s the definition of independence). Consequently, the chance that all values in the sample are greater than $$\epsilon$$ from $$m$$ is

$$\Pr(|X_i-m|\gt \epsilon\ \forall i) = \left[1- \Pr(|X-m|\le \epsilon)\right]^n.$$

Its complement therefore is the chance that at least one of the $$X_i$$ is within distance $$\epsilon$$ of $$m.$$ This is precisely the distribution function of the nearest distance. Writing $$E = \min|X_i-m|$$ for that distance, we have found

$$F_E(\epsilon) = \Pr(E\le \epsilon) = 1 – \left[1- \Pr(|X-m|\le \epsilon)\right]^n.$$

This is a thorough and fully general answer. When $$F$$ is continuous at $$m\pm\epsilon$$ (with density function $$f$$) though, we can (a) neglect that last probability term and (b) differentiate the expression to obtain a density for $$E,$$

$$f_E(\epsilon) = \frac{\mathrm d}{\mathrm{d}\epsilon} F_E(\epsilon) = n\left[F(m+\epsilon) – F(m-\epsilon)\right]^{n-1} \left(f(m+\epsilon) + f(m-\epsilon)\right).$$

Here are some plots of $$f_E$$ for various sample sizes from the standard Normal distribution. It all makes sense: as you look from left to right, the sample size increases and therefore the chance of being close to any given $$m$$ increases. As $$m$$ increases from $$0$$ (the mode) to $$4$$ (far out into the right tail), the chance of being close to $$m$$ remains small, but the typical nearest distance to $$m$$ shrinks.

In a similar fashion you can write the (more complicated) formula for the signed distance between the nearest $$X$$ and $$m.$$ Adding $$m$$ to this will produce a distribution of the nearest $$X,$$ if that’s what you want.

This is the R code used to generate the figure. It implements $$F_E$$ as pnormclosest and $$f_E$$ as dnormclosest. They are readily modified to handle any distribution $$F$$ by replacing pnorm and dnorm by its distribution and density functions, respectively.

pnormclosest <- function(x, m, n=1, mu=0, sigma=1) {
1 - (pnorm(m-x, mu, sigma) + pnorm(m+x, mu, sigma, lower.tail=FALSE))^n
}
dnormclosest <- function(x, m, n=1, mu=0, sigma=1) {
n * (pnorm(m-x, mu, sigma) + pnorm(m+x, mu, sigma, lower.tail=FALSE))^(n-1) *
(dnorm(m-x, mu, sigma) + dnorm(m+x, mu, sigma))
}

ns <- c(1, 2, 20, 100)
ms <- c(0, 1, 2, 4)
par(mfrow = c(1, length(ns)))
for (n in ns) {
for (m in ms) curve(dnormclosest(x, m, n), 0, 3, ylim=c(0,2), add=m != 0,
lwd=2, lty=abs(m)+1, col=hsv(abs(m)/(max(abs(ms))+1), .9, .8),
xlab="Distance", ylab="Density",
main=paste0("Sample size ", n))
legend("topright", bty="n", title="m", legend=ms, lty=abs(ms)+1, lwd=2,
col=hsv(abs(ms)/(max(abs(ms))+1), .9, .8))
}
par(mfrow=c(1,1))