# Maximum gap between samples drawn without replacement from a discrete uniform distribution

This problem is related to my lab’s research in robotic coverage:

Randomly draw $n$ numbers from the set $\{1,2,\ldots,m\}$ without replacement, and sort the numbers in ascending order. $1\le n\le m$.

From this sorted list of numbers $\{a_{(1)},a_{(2)},…,a_{(n)}\}$, generate the difference between consecutive numbers and the boundaries: $g = \{a_{(1)},a_{(2)}−a_{(1)},\ldots,a_{(n)}−a_{(n-1)},m+1-a_{(n)}\}$. This gives $n+1$ gaps.

What is the distribution of the maximum gap?

$P(\max(g) = k) = P(k;m,n) = ?$

This can be framed using order statistics:
$P(g_{(n+1)} = k) = P(k;m,n) = ?$

See link for the distribution of gaps, but this question asks the distribution of the maximum gap.

I’d be satisfied with the average value, $\mathbb{E}[g_{(n+1)}]$.

If $n=m$ all the gaps are size 1.
If $n+1 = m$ there is one gap of size $2$, and $n+1$ possible locations.
The maximum gap size is $m-n+1$, and this gap can be placed before or after any of the $n$ numbers, for a total of $n+1$ possible positions.
The smallest maximum gap size is $\lceil\frac{m-n}{n+1}\rceil$.
Define the probability of any given combination $T= {m \choose n}^{-1}$.

I’ve partially solved the probability mass function as
$P(g_{(n+1)} = k) = P(k;m,n) = \begin{cases} 0 & k < \lceil\frac{m-n}{n+1}\rceil\\ 1 & k = \frac{m-n}{n+1} \\ 1 & k = 1 \text{ (occurs when m=n)} \\ T(n+1)& k = 2 \text{ (occurs when m=n+1)} \\ T(n+1)& k = \frac{m-(n-1)}{n} \\ ? & \frac{m-(n-1)}{n} \le k \le m-n+1 \\ T(n+1)& k = m-n+1\\ 0 & k > m-n+1 \end{cases} \tag{1}$

Current work (1):
The equation for the first gap, $a_{(1)}$ is straightforward:

The expected value has a simple value:
$\mathbb{E}[P(a_{(1)})] = \frac{1}{ {m \choose n}} \sum_{k=1}^{m-n+1} {m-k-1 \choose n-1} k = \frac{m-n}{1+n}$.
By symmetry, I expect all $n$ gaps to have this distribution. Perhaps the solution could be found by drawing from this distribution $n$ times.

Current work (2): it is easy to run Monte Carlo simulations.

simMaxGap[m_, n_] := Max[Differences[Sort[Join[RandomSample[Range[m], n], {0, m+1}]]]];
m = 1000; n = 1; trials = 100000;
SmoothHistogram[Table[simMaxGap[m, n], {trials}], Filling -> Axis,
Frame -> {True, True, False, False},
FrameLabel -> {"k (Max gap)", "Probability"},
PlotLabel -> StringForm["m=,n=,smooth histogram of maximum map for  trials", m, n, trials]][![enter image description here][1]][1]


Let $f(g;n,m)$ be the chance that the minimum, $a_{(1)}$, equals $g$; that is, the sample consists of $g$ and an $n-1$-subset of $\{g+1,g+2,\ldots,m\}$. There are $\binom{m-g}{n-1}$ such subsets out of the $\binom{m}{n}$ equally likely subsets, whence

Adding $f(k;n,m)$ for all possible values of $k$ greater than $g$ yields the survival function

Let $G_{n,m}$ be the random variable given by the largest gap:

(This responds to the question as originally framed, before it was modified to include a gap between $a_{(n)}$ and $m$.)
We will compute its survival function from which the entire distribution of $G_{n,m}$ is readily derived. The method is a dynamic program beginning with $n=1$, for which it is obvious that

For larger $n\gt 1$, note that the event $G_{n,m}\gt g$ is the disjoint union of the event

for which the very first gap exceeds $g$, and the $g$ separate events

for which the first gap equals $k$ and a gap greater than $g$ occurs later in the sample. The Law of Total Probability asserts the probabilities of these events add, whence

Fixing $g$ and laying out a two-way array indexed by $i=1,2,\ldots,n$ and $j=1,2,\ldots,m$, we may compute $P(g;n,m)$ by using $(1)$ to fill in its first row and $(2)$ to fill in each successive row using $O(gm)$ operations per row. Consequently the table can be completed in $O(gmn)$ operations and all tables for $g=1$ through $g=m-n+1$ can be constructed in $O(m^3n)$ operations.

These graphs show the survival function $g\to P(g;n,64)$ for $n=1,2,4,8,16,32,64$. As $n$ increases, the graph moves to the left, corresponding to the decreasing chances of large gaps.

Closed formulas for $P(g;n,m)$ can be obtained in many special cases, especially for large $n$, but I have not been able to obtain a closed formula that applies to all $g,n,m$. Good approximations are readily available by replacing this problem with the analogous problem for continuous uniform variables.

Finally, the expectation of $G_{n,m}$ is obtained by summing its survival function starting at $g=0$:

This contour plot of the expectation shows contours at $2, 4, 6, \ldots, 32$, graduating from dark to light.