Draw integers independently & uniformly at random from 1 to NN using fair d6?

I wish to draw integers from 1 to some specific N by rolling some number of fair six-sided dice (d6). A good answer will explain why its method produces uniform and independent integers.

As an illustrative example, it would be helpful to explain how a solution works for the case of N=150.

Furthermore, I wish for the procedure to be as efficient as possible: roll the least number of d6 on average for each number generated.

Conversions from senary to decimal are permissible.


This question was inspired by this Meta thread.

Answer

The set Ω(d,n) of distinct identifiable outcomes in n independent rolls of a die with d=6 faces has dn elements. When the die is fair, that means each outcome of one roll has probability 1/d and independence means each of these outcomes will therefore have probability (1/d)n: that is, they have a uniform distribution Pd,n.

Suppose you have devised some procedure t that either determines m outcomes of a c(=150)-sided die–that is, an element of Ω(c,m)–or else reports failure (which means you will have to repeat it in order to obtain an outcome). That is,

t:Ω(d,n)Ω(c,m){Failure}.

Let F be the probability t results in failure and note that F is some integral multiple of dn, say

F=Pr

(For future reference, note that the expected number of times t must be invoked before not failing is 1/(1-F).)

The requirement that these outcomes in \Omega(c,m) be uniform and independent conditional on t not reporting failure means that t preserves probability in the sense that for every event \mathcal{A}\subset\Omega(c,m),

\frac{\mathbb{P}_{d,n}\left(t^{*}\mathcal{A}\right)}{1-F}= \mathbb{P}_{c,m}\left(\mathcal{A}\right) \tag{1}

where

t^{*}\left(\mathcal A\right) = \{\omega\in\Omega\mid t(\omega)\in\mathcal{A}\}

is the set of die rolls that the procedure t assigns to the event \mathcal A.

Consider an atomic event \mathcal A = \{\eta\}\subset\Omega(c,m), which must have probability c^{-m}. Let t^{*}\left(\mathcal A\right) (the dice rolls associated with \eta) have N_\eta elements. (1) becomes

\frac{N_\eta d^{-n}}{1 – N_F d^{-n}} = \frac{\mathbb{P}_{d,n}\left(t^{*}\mathcal{A}\right)}{1-F}= \mathbb{P}_{c,m}\left(\mathcal{A}\right) = c^{-m}.\tag{2}

It is immediate that the N_\eta are all equal to some integer N. It remains only to find the most efficient procedures t. The expected number of non-failures per roll of the c sided die is

\frac{1}{m}\left(1 – F\right).

There are two immediate and obvious implications. One is that if we can keep F small as m grows large, then the effect of reporting a failure is asymptotically zero. The other is that for any given m (the number of rolls of the c-sided die to simulate), we want to make F as small as possible.

Let’s take a closer look at (2) by clearing the denominators:

N c^m = d^n – N_F \gt 0.

This makes it obvious that in a given context (determined by c,d,n,m), F is made as small as possible by making d^n-N_F equal the largest multiple of c^m that is less than or equal to d^n. We may write this in terms of the greatest integer function (or “floor”) \lfloor*\rfloor as

N = \bigg\lfloor \frac{d^n}{c^m} \bigg\rfloor.

Finally, it is clear that N ought to be as small as possible for highest efficiency, because it measures redundancy in t. Specifically, the expected number of rolls of the d-sided die needed to produce one roll of the c-sided die is

N \times \frac{n}{m} \times \frac{1}{1-F}.

Thus, our search for high-efficiency procedures ought to focus on the cases where d^n is equal to, or just barely greater than, some power c^m.

The analysis ends by showing that for given d and c, there is a sequence of multiples (n,m) for which this approach approximates perfect efficiency. This amounts to finding (n,m) for which d^n/c^m \ge 1 approaches N=1 in the limit (automatically guaranteeing F\to 0). One such sequence is obtained by taking n=1,2,3,\ldots and determining

m = \bigg\lfloor \frac{n\log d}{\log c} \bigg\rfloor.\tag{3}

The proof is straightforward.

This all means that when we are willing to roll the original d-sided die a sufficiently large number of times n, we can expect to simulate nearly \log d / \log c = \log_c d outcomes of a c-sided die per roll. Equivalently,

It is possible to simulate a large number m of independent rolls of a c-sided die using a fair d-sided die using an average of \log(c)/\log(d) + \epsilon = \log_d(c) + \epsilon rolls per outcome where \epsilon can be made arbitrarily small by choosing m sufficiently large.


Examples and algorithms

In the question, d=6 and c=150, whence

\log_d(c) = \frac{\log(c)}{\log(d)} \approx 2.796489.

Thus, the best possible procedure will require, on average, at least 2.796489 rolls of a d6 to simulate each d150 outcome.

The analysis shows how to do this. We don’t need to resort to number theory to carry it out: we can just tabulate the powers d^n=6^n and the powers c^m=150^m and compare them to find where c^m \le d^n are close. This brute force calculation gives (n,m) pairs

(n,m) \in \{(3,1), (14,5), \ldots\}

for instance, corresponding to the numbers

(6^n, 150^m) \in \{(216,150), (78364164096,75937500000), \ldots\}.

In the first case t would associate 216-150=66 of the outcomes of three rolls of a d6 to Failure and the other 150 outcomes would each be associated with a single outcome of a d150.

In the second case t would associate 78364164096-75937500000 of the outcomes of 14 rolls of a d6 to Failure — about 3.1% of them all — and otherwise would output a sequence of 5 outcomes of a d150.

A simple algorithm to implement t labels the faces of the d-sided die with the numerals 0,1,\ldots, d-1 and the faces of the c-sided die with the numerals 0,1,\ldots, c-1. The n rolls of the first die are interpreted as an n-digit number in base d. This is converted to a number in base c. If it has at most m digits, the sequence of the last m digits is the output. Otherwise, t returns Failure by invoking itself recursively.

For much longer sequences, you can find suitable pairs (n,m) by considering every other convergent n/m of the continued fraction expansion of x=\log(c)/\log(d). The theory of continued fractions shows that these convergents alternate between being less than x and greater than it (assuming x is not already rational). Choose those that are less than x.

In the question, the first few such convergents are

3, 14/5, 165/59, 797/285, 4301/1538, 89043/31841, 279235/99852, 29036139/10383070 \ldots.

In the last case, a sequence of 29,036,139 rolls of a d6 will produce a sequence of 10,383,070 rolls of a d150 with a failure rate less than 2\times 10^{-8}, for an efficiency of 2.79649–indistinguishable from the asymptotic limit.

Attribution
Source : Link , Question Author : Sycorax , Answer Author : whuber

Leave a Comment