What is the PDF for the minimum difference between a random number and a set of random numbers

I have a list (lets call it $ \{L_N\} $) of N random numbers $R\in(0,1)$ (chosen from a uniform distribution). Next, I roll another random number from the same distribution (let’s call this number “b”).
Now I find the element in the list $ \{L_N\} $ that is the closest to the number “b” and find this distance.

If I repeat this process, I can plot the distribution of distances that are obtained through this process.

When $N\to \infty$, what does this distribution approach?

When I simulate this in Mathematica, it appears as though it approaches an exponential function. And if the list was 1 element long, then I believe this would exactly follow an exponential distribution.

Looking at the wikipedia for exponential distributions, I can see that there is some discussion on the topic:

enter image description here

But I’m having trouble interpreting what they are saying here. What is “k” here? Is my case what they are describing here in the limit where $n\to \infty$?

EDIT: After a very helpful helpful intuitive answer by Bayequentist, I understand now that the behavior as $N \to \infty$ should approach a dirac delta function. But I’d still like to understand why my data (which is like the minimum of a bunch of exponential distributions), appears to also be exponential. And is there a way that I can figure out what this distribution is exactly (for large but finite N)?

Here is a picture of what the such a distribution looks like for large but finite N:
enter image description here

EDIT2:
Here’s some python code to simulate these distributions:

%matplotlib inline
import math
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
numpoints = 10000
NBINS = 1000

randarray1 = np.random.random_sample((numpoints,))
randarray2 = np.random.random_sample((numpoints,))

dtbin = []

for i in range(len(t1)):
    dt = 10000000
    for j in range(len(t2)):
        delta = t1[i]-t2[j]
        if abs(delta) < abs(dt): 
            dt = delta
    dtbin.append(dt)

plt.figure()
plt.hist(dtbin, bins = NBINS)
plt.show()

Answer

If you had been looking for the distance to the next value above, and if you inserted an extra value at $1$ so this always had an answer, then using rotational symmetry the distribution of these distances $D$ would be the same as the distribution of the minimum of $n+1$ independent uniform random variables on $[0,1]$.

That would have $P(D \le d) = 1-(1-d)^{n+1}$ and so density $f(d)=(n+1)(1-d)^n$ when $0 \le d \le 1$. For large $n$ and small $d$ this density can be approximated by $f(d) \approx n e^{-nd}$, explaining the exponential shape you have spotted.

But your question is slightly more complicated, as you are interested in the signed distance to the nearest value above or below. As your Wikipedia link shows, the minimum of two i.i.d. exponential random variables with rate $\lambda$ is an exponential random variable with rate $2\lambda$. So you need to change the approximation to the density to reflect both the doubled rate and the possibility of negative values of $d$. The approximation actually becomes a Laplace distribution with $$f(d) \approx n e^{-2n|d|}$$ remembering this is for large $n$ and small $d$ (in particular the true density is $0$ unless $-\frac12 \lt d \lt \frac12$). As $n$ increases, this concentrates almost all the density at $0$ as in Bayequentist’s response of the limit of a Dirac delta distribution

With $n=10^6$ the approximation to the density would look like this, matching the shape of your simulated data.

enter image description here

Attribution
Source : Link , Question Author : Steven Sagona , Answer Author : Henry

Leave a Comment