I wish to draw integers from 1 to some specific N by rolling some number of fair six-sided dice (d6). A good answer will explain why its method produces

uniformandindependentintegers.As an illustrative example, it would be helpful to explain how a solution works for the case of N=150.

Furthermore, I wish for the procedure to be as efficient as possible: roll the least number of d6 on average for each number generated.

Conversions from senary to decimal are permissible.

**Answer**

The set Ω(d,n) of distinct identifiable outcomes in n independent rolls of a die with d=6 faces has dn elements. When the die is fair, that means each outcome of one roll has probability 1/d and independence means each of these outcomes will therefore have probability (1/d)n: that is, they have a uniform distribution Pd,n.

Suppose you have devised some procedure t that either determines m outcomes of a c(=150)-sided die–that is, an element of Ω(c,m)–or else reports failure (which means you will have to repeat it in order to obtain an outcome). That is,

t:Ω(d,n)→Ω(c,m)∪{Failure}.

Let F be the probability t results in failure and note that F is some integral multiple of d−n, say

F=Pr

(For future reference, note that the expected number of times t must be invoked before not failing is 1/(1-F).)

The requirement that these outcomes in \Omega(c,m) be uniform and independent *conditional* on t not reporting failure means that t preserves probability in the sense that for every event \mathcal{A}\subset\Omega(c,m),

\frac{\mathbb{P}_{d,n}\left(t^{*}\mathcal{A}\right)}{1-F}= \mathbb{P}_{c,m}\left(\mathcal{A}\right) \tag{1}

where

t^{*}\left(\mathcal A\right) = \{\omega\in\Omega\mid t(\omega)\in\mathcal{A}\}

is the set of die rolls that the procedure t assigns to the event \mathcal A.

Consider an atomic event \mathcal A = \{\eta\}\subset\Omega(c,m), which must have probability c^{-m}. Let t^{*}\left(\mathcal A\right) (the dice rolls associated with \eta) have N_\eta elements. (1) becomes

\frac{N_\eta d^{-n}}{1 – N_F d^{-n}} = \frac{\mathbb{P}_{d,n}\left(t^{*}\mathcal{A}\right)}{1-F}= \mathbb{P}_{c,m}\left(\mathcal{A}\right) = c^{-m}.\tag{2}

**It is immediate that the N_\eta are all equal to some integer N.** It remains only to find the most efficient procedures t. The expected number of non-failures *per roll of the c sided die* is

\frac{1}{m}\left(1 – F\right).

**There are two immediate and obvious implications.** One is that if we can keep F small as m grows large, then the effect of reporting a failure is asymptotically zero. The other is that for any given m (the number of rolls of the c-sided die to simulate), we want to make F as small as possible.

Let’s take a closer look at (2) by clearing the denominators:

N c^m = d^n – N_F \gt 0.

This makes it obvious that in a given context (determined by c,d,n,m), F is made as small as possible by making d^n-N_F equal the largest multiple of c^m that is less than or equal to d^n. We may write this in terms of the greatest integer function (or “floor”) \lfloor*\rfloor as

N = \bigg\lfloor \frac{d^n}{c^m} \bigg\rfloor.

Finally, it is clear that N ought to be as small as possible for highest efficiency, because it measures *redundancy* in t. Specifically, the expected number of rolls of the d-sided die needed to produce one roll of the c-sided die is

N \times \frac{n}{m} \times \frac{1}{1-F}.

**Thus, our search for high-efficiency procedures ought to focus on the cases where d^n is equal to, or just barely greater than, some power c^m.**

The analysis ends by showing that for given d and c, there is a sequence of multiples (n,m) for which this approach approximates perfect efficiency. This amounts to finding (n,m) for which d^n/c^m \ge 1 approaches N=1 in the limit (automatically guaranteeing F\to 0). One such sequence is obtained by taking n=1,2,3,\ldots and determining

m = \bigg\lfloor \frac{n\log d}{\log c} \bigg\rfloor.\tag{3}

The proof is straightforward.

This all means that when we are willing to roll the original d-sided die a sufficiently large number of times n, we can expect to simulate nearly \log d / \log c = \log_c d outcomes of a c-sided die per roll. Equivalently,

It is possible to simulate a large number m of independent rolls of a c-sided die using a fair d-sided die using an average of \log(c)/\log(d) + \epsilon = \log_d(c) + \epsilon rolls per outcome where \epsilon can be made arbitrarily small by choosing m sufficiently large.

### Examples and algorithms

In the question, d=6 and c=150, whence

\log_d(c) = \frac{\log(c)}{\log(d)} \approx 2.796489.

Thus, **the best possible procedure will require, on average, at least 2.796489 rolls of a d6 to simulate each d150 outcome.**

The analysis shows how to do this. We don’t need to resort to number theory to carry it out: we can just tabulate the powers d^n=6^n and the powers c^m=150^m and compare them to find where c^m \le d^n are close. This brute force calculation gives (n,m) pairs

(n,m) \in \{(3,1), (14,5), \ldots\}

for instance, corresponding to the numbers

(6^n, 150^m) \in \{(216,150), (78364164096,75937500000), \ldots\}.

In the first case t would associate 216-150=66 of the outcomes of three rolls of a `d6`

to Failure and the other 150 outcomes would each be associated with a single outcome of a `d150`

.

In the second case t would associate 78364164096-75937500000 of the outcomes of 14 rolls of a `d6`

to Failure — about 3.1% of them all — and otherwise would output a sequence of 5 outcomes of a `d150`

.

**A simple algorithm to implement t** labels the faces of the d-sided die with the numerals 0,1,\ldots, d-1 and the faces of the c-sided die with the numerals 0,1,\ldots, c-1. The n rolls of the first die are interpreted as an n-digit number in base d. This is converted to a number in base c. If it has at most m digits, the sequence of the last m digits is the output. Otherwise, t returns Failure by invoking itself recursively.

**For much longer sequences, you can find suitable pairs** (n,m) by considering every other convergent n/m of the continued fraction expansion of x=\log(c)/\log(d). The theory of continued fractions shows that these convergents alternate between being less than x and greater than it (assuming x is not already rational). Choose those that are less than x.

In the question, the first few such convergents are

3, 14/5, 165/59, 797/285, 4301/1538, 89043/31841, 279235/99852, 29036139/10383070 \ldots.

In the last case, a sequence of 29,036,139 rolls of a `d6`

will produce a sequence of 10,383,070 rolls of a `d150`

with a failure rate less than 2\times 10^{-8}, for an efficiency of 2.79649–indistinguishable from the asymptotic limit.

**Attribution***Source : Link , Question Author : Sycorax , Answer Author : whuber*