Maximum gap between samples drawn without replacement from a discrete uniform distribution

This problem is related to my lab’s research in robotic coverage:

Randomly draw n numbers from the set {1,2,,m} without replacement, and sort the numbers in ascending order. 1nm.

From this sorted list of numbers {a(1),a(2),,a(n)}, generate the difference between consecutive numbers and the boundaries: g={a(1),a(2)a(1),,a(n)a(n1),m+1a(n)}. This gives n+1 gaps.

What is the distribution of the maximum gap?

P(max

This can be framed using order statistics:
P(g_{(n+1)} = k) = P(k;m,n) = ?

See link for the distribution of gaps, but this question asks the distribution of the maximum gap.

I’d be satisfied with the average value, \mathbb{E}[g_{(n+1)}].

If n=m all the gaps are size 1.
If n+1 = m there is one gap of size 2, and n+1 possible locations.
The maximum gap size is m-n+1, and this gap can be placed before or after any of the n numbers, for a total of n+1 possible positions.
The smallest maximum gap size is \lceil\frac{m-n}{n+1}\rceil.
Define the probability of any given combination T= {m \choose n}^{-1}.

I’ve partially solved the probability mass function as

P(g_{(n+1)} = k) = P(k;m,n) =
\begin{cases}
0 & k < \lceil\frac{m-n}{n+1}\rceil\\
1 & k = \frac{m-n}{n+1} \\
1 & k = 1 \text{ (occurs when $m=n$)} \\
T(n+1)& k = 2 \text{ (occurs when $m=n+1$)} \\
T(n+1)& k = \frac{m-(n-1)}{n} \\
? & \frac{m-(n-1)}{n} \le k \le m-n+1 \\
T(n+1)& k = m-n+1\\
0 & k > m-n+1
\end{cases} \tag{1}

Current work (1):
The equation for the first gap, a_{(1)} is straightforward:

P(a_{(1)} = k) = P(k;m,n) = \frac{1}{{m \choose n}}
\sum_{k=1}^{m-n+1} {m-k-1 \choose n-1}

The expected value has a simple value:
\mathbb{E}[P(a_{(1)})] = \frac{1}{
{m \choose n}}
\sum_{k=1}^{m-n+1} {m-k-1 \choose n-1} k = \frac{m-n}{1+n}
.
By symmetry, I expect all n gaps to have this distribution. Perhaps the solution could be found by drawing from this distribution n times.

Current work (2): it is easy to run Monte Carlo simulations.

simMaxGap[m_, n_] := Max[Differences[Sort[Join[RandomSample[Range[m], n], {0, m+1}]]]];
m = 1000; n = 1; trials = 100000;
SmoothHistogram[Table[simMaxGap[m, n], {trials}], Filling -> Axis,
Frame -> {True, True, False, False},
FrameLabel -> {"k (Max gap)", "Probability"},
PlotLabel -> StringForm["m=``,n=``,smooth histogram of maximum map for `` trials", m, n, trials]][![enter image description here][1]][1]

Answer

Let f(g;n,m) be the chance that the minimum, a_{(1)}, equals g; that is, the sample consists of g and an n-1-subset of \{g+1,g+2,\ldots,m\}. There are \binom{m-g}{n-1} such subsets out of the \binom{m}{n} equally likely subsets, whence

\Pr(a_{(1)}=g = f(g;n,m) = \frac{\binom{m-g}{n-1}}{\binom{m}{n}}.

Adding f(k;n,m) for all possible values of k greater than g yields the survival function

\Pr(a_{(1)} \gt g) = Q(g;n,m)= \frac{(m-g)\binom{m-g-1}{n-1}}{n \binom{m}{n}}.

Let G_{n,m} be the random variable given by the largest gap:

G_{n,m} = \max\left(a_{(1)}, a_{(2)}-a_{(1)}, \ldots, a_{(n)}-a_{(n-1)}\right).

(This responds to the question as originally framed, before it was modified to include a gap between a_{(n)} and m.)
We will compute its survival function P(g;n,m)=\Pr(G_{n,m}\gt g), from which the entire distribution of G_{n,m} is readily derived. The method is a dynamic program beginning with n=1, for which it is obvious that

P(g;1,m) = \Pr(G_{1,m} \gt 1) = \frac{m-g}{m},\ g=0, 1, \ldots, m.\tag{1}

For larger n\gt 1, note that the event G_{n,m}\gt g is the disjoint union of the event

a_{1} \gt g,

for which the very first gap exceeds g, and the g separate events

a_{1}=k\text{ and } G_{n-1,m-k} \gt g, \ k=1, 2, \ldots, g

for which the first gap equals k and a gap greater than g occurs later in the sample. The Law of Total Probability asserts the probabilities of these events add, whence

P(g;n,m) = Q(g;n,m) + \sum_{k=1}^g f(k;n,m) P(g;n-1,m-k).\tag{2}

Fixing g and laying out a two-way array indexed by i=1,2,\ldots,n and j=1,2,\ldots,m, we may compute P(g;n,m) by using (1) to fill in its first row and (2) to fill in each successive row using O(gm) operations per row. Consequently the table can be completed in O(gmn) operations and all tables for g=1 through g=m-n+1 can be constructed in O(m^3n) operations.

Figure

These graphs show the survival function g\to P(g;n,64) for n=1,2,4,8,16,32,64. As n increases, the graph moves to the left, corresponding to the decreasing chances of large gaps.

Closed formulas for P(g;n,m) can be obtained in many special cases, especially for large n, but I have not been able to obtain a closed formula that applies to all g,n,m. Good approximations are readily available by replacing this problem with the analogous problem for continuous uniform variables.

Finally, the expectation of G_{n,m} is obtained by summing its survival function starting at g=0:

\mathbb{E}(G_{n,m}) = \sum_{g=0}^{m-n+1} P(g;n,m).

Figure 2: contour plot of expectation

This contour plot of the expectation shows contours at 2, 4, 6, \ldots, 32, graduating from dark to light.

Attribution
Source : Link , Question Author : AaronBecker , Answer Author : whuber

Leave a Comment