# Formula for dropping dice (non-brute force)

First of all I’m not sure where this question should be posted. I’m asking if a statistics problem is NP-Complete and if not to solve it programmatically. I’m posting it here because the statistics problem is the center point.

I’m trying to find a better formula for solving a problem. The problem is: if I have 4d6 (4 ordinary 6 sided dice) and roll them all at once, remove a die with the lowest number (called “dropping”), then sum the remaining 3, what is the probability of each possible result? I know the answer is this:

Sum (Frequency): Probability
3   (1):         0.0007716049
4   (4):         0.0030864198
5   (10):        0.0077160494
6   (21):        0.0162037037
7   (38):        0.0293209877
8   (62):        0.0478395062
9   (91):        0.0702160494
10  (122):       0.0941358025
11  (148):       0.1141975309
12  (167):       0.1288580247
13  (172):       0.1327160494
14  (160):       0.1234567901
15  (131):       0.1010802469
16  (94):        0.0725308642
17  (54):        0.0416666667
18  (21):        0.0162037037


The average is 12.24 and standard deviation is 2.847.

I found the above answer by brute force and don’t know how or if there is a formula for it. I suspect this problem is NP-Complete and therefore can only be solved by brute force. It might be possible to get all probabilities of 3d6 (3 normal 6 sided dice) then skew each of them upward. This would be faster than brute force because I have a fast formula when all dice are kept.

I programmed the formula for keeping all dice in college. I had asked my statistics professor about it and he found this page, which he then explained to me. There is a big performance difference between this formula and brute force: 50d6 took 20 seconds but 8d6 drop lowest crashes after 40 seconds (chrome runs out of memory).

Is this problem NP-Complete? If yes please provide a proof, if no please provide a non-brute force formula to solve it.

Note that I don’t know much about NP-Complete so I might be thinking of NP, NP-Hard, or something else. The proof for NP-Completeness is useless to me the only reason why I ask for it is to prevent people from guessing. And please bare with me as it’s been a long time since I worked on this: I don’t remember statistics as well as I might need to solve this.

Ideally I’m looking for a more generic formula for X number of dice with Y sides when N of them are dropped but am starting with something much more simple.

### Edit:

I would also prefer the formula to output frequencies but it is acceptable to only output probabilities.

For those interested I have programmed whuber’s answer in JavaScript on my GitHub (in this commit only the tests actually use the functions defined).

### Solution

Let there be $n=4$ dice each giving equal chances to the outcomes $1, 2, \ldots, d=6$. Let $K$ be the minimum of the values when all $n$ dice are independently thrown.

Consider the distribution of the sum of all $n$ values conditional on $K$. Let $X$ be this sum. The generating function for the number of ways to form any given value of $X$, given that the minimum is at least $k$, is

Since the dice are independent, the generating function for the number of ways to form values of $X$ where all $n$ dice show values of $k$ or greater is

This generating function includes terms for the events where $K$ exceeds $k$, so we need to subtract them off. Therefore the generating function for the number of ways to form values of $X$, given $K=k$, is

Noting that the sum of the $n-1$ highest values is the sum of all values minus the smallest, equal to $X-K$. The generating function therefore needs to be divided by $k$. It becomes a probability generating function upon multiplying by the common chance of any combination of dice, $(1/d)^n$:

Since all the polynomial products and powers can be computed in $O(n\log n)$ operations (they are convolutions and therefore can be carried out with the discrete Fast Fourier Transform), the total computational effort is $O(k\,n\log n)$. In particular, it is a polynomial time algorithm.

### Example

Let’s work through the example in the question with $n=4$ and $d=6$.

Formula $(1)$ for the PGF of $X$ conditional on $K\ge k$ gives

Raising them to the $n=4$ power as in formula $(2)$ produces

Their successive differences in formula $(3)$ are

The resulting sum in formula $(4)$ is

For example, the chance that the top three dice sum to $14$ is the coefficient of $x^{14}$, equal to

It is in perfect agreement with the probabilities quoted in the question.

By the way, the mean (as calculated from this result) is $15869/1296 \approx 12.244598765\ldots$ and the standard deviation is $\sqrt{13\,612\,487/1\,679\,616}\approx 2.8468444$.

A similar (unoptimized) calculation for $n=400$ dice instead of $n=4$ took less than a half a second, supporting the contention that this is not a computationally demanding algorithm. Here is a plot of the main part of the distribution:

Since the minimum $K$ is highly likely to equal $1$ and the sum $X$ will be extremely close to having a Normal$(400\times 7/2, 400\times 35/12)$ distribution (whose mean is $1400$ and standard deviation is approximately $34.1565$), the mean must be extremely close to $1400-1=1399$ and the standard deviation extremely close to $34.16$. This nicely describes the plot, indicating it is likely correct. In fact, the exact calculation gives a mean of around $2.13\times 10^{-32}$ greater than $1399$ and a standard deviation around $1.24\times 10^{-31}$ less than $\sqrt{400\times 35/12}$.