First of all I’m not sure where this question should be posted. I’m asking if a statistics problem is NP-Complete and if not to solve it programmatically. I’m posting it here because the statistics problem is the center point.
I’m trying to find a better formula for solving a problem. The problem is: if I have 4d6 (4 ordinary 6 sided dice) and roll them all at once, remove a die with the lowest number (called “dropping”), then sum the remaining 3, what is the probability of each possible result? I know the answer is this:
Sum (Frequency): Probability 3 (1): 0.0007716049 4 (4): 0.0030864198 5 (10): 0.0077160494 6 (21): 0.0162037037 7 (38): 0.0293209877 8 (62): 0.0478395062 9 (91): 0.0702160494 10 (122): 0.0941358025 11 (148): 0.1141975309 12 (167): 0.1288580247 13 (172): 0.1327160494 14 (160): 0.1234567901 15 (131): 0.1010802469 16 (94): 0.0725308642 17 (54): 0.0416666667 18 (21): 0.0162037037
The average is 12.24 and standard deviation is 2.847.
I found the above answer by brute force and don’t know how or if there is a formula for it. I suspect this problem is NP-Complete and therefore can only be solved by brute force. It might be possible to get all probabilities of 3d6 (3 normal 6 sided dice) then skew each of them upward. This would be faster than brute force because I have a fast formula when all dice are kept.
I programmed the formula for keeping all dice in college. I had asked my statistics professor about it and he found this page, which he then explained to me. There is a big performance difference between this formula and brute force: 50d6 took 20 seconds but 8d6 drop lowest crashes after 40 seconds (chrome runs out of memory).
Is this problem NP-Complete? If yes please provide a proof, if no please provide a non-brute force formula to solve it.
Note that I don’t know much about NP-Complete so I might be thinking of NP, NP-Hard, or something else. The proof for NP-Completeness is useless to me the only reason why I ask for it is to prevent people from guessing. And please bare with me as it’s been a long time since I worked on this: I don’t remember statistics as well as I might need to solve this.
Ideally I’m looking for a more generic formula for X number of dice with Y sides when N of them are dropped but am starting with something much more simple.
Edit:
I would also prefer the formula to output frequencies but it is acceptable to only output probabilities.
For those interested I have programmed whuber’s answer in JavaScript on my GitHub (in this commit only the tests actually use the functions defined).
Answer
Solution
Let there be n=4 dice each giving equal chances to the outcomes 1,2,…,d=6. Let K be the minimum of the values when all n dice are independently thrown.
Consider the distribution of the sum of all n values conditional on K. Let X be this sum. The generating function for the number of ways to form any given value of X, given that the minimum is at least k, is
f(n,d,k)(x)=xk+xk+1+⋯+xd=xk1−xd−k+11−x.
Since the dice are independent, the generating function for the number of ways to form values of X where all n dice show values of k or greater is
f(n,d,k)(x)n=xkn(1−xd−k+11−x)n.
This generating function includes terms for the events where K exceeds k, so we need to subtract them off. Therefore the generating function for the number of ways to form values of X, given K=k, is
f(n,d,k)(x)n−f(n,d,k+1)(x)n.
Noting that the sum of the n−1 highest values is the sum of all values minus the smallest, equal to X−K. The generating function therefore needs to be divided by k. It becomes a probability generating function upon multiplying by the common chance of any combination of dice, (1/d)n:
d−nd∑k=1x−k(f(n,d,k)(x)n−f(n,d,k+1)(x)n).
Since all the polynomial products and powers can be computed in O(nlogn) operations (they are convolutions and therefore can be carried out with the discrete Fast Fourier Transform), the total computational effort is O(knlogn). In particular, it is a polynomial time algorithm.
Example
Let’s work through the example in the question with n=4 and d=6.
Formula (1) for the PGF of X conditional on K≥k gives
f(4,6,1)(x)=x+x2+x3+x4+x5+x6f(4,6,2)(x)=x2+x3+x4+x5+x6…f(4,6,5)(x)=x5+x6f(4,6,6)(x)=x6f(4,6,7)(x)=0.
Raising them to the n=4 power as in formula (2) produces
f(4,6,1)(x)4=x4+4x5+10x6+⋯+4x23+x24f(4,6,2)(x)4=x8+4x9+10x10+⋯+4x23+x24…f(4,6,5)(x)4=x20+4x21+6x22+4x23+x24f(4,6,6)(x)4=x24f(4,6,7)(x)4=0
Their successive differences in formula (3) are
f(4,6,1)(x)4−f(4,6,2)(x)4=x4+4x5+10x6+⋯+12x18+4x19f(4,6,2)(x)4−f(4,6,3)(x)4=x8+4x9+10x10+⋯+4x20…f(4,6,5)(x)4−f(4,6,6)(x)4=x20+4x21+6x22+4x23f(4,6,6)(x)4−f(4,6,7)(x)4=x24.
The resulting sum in formula (4) is
6−4(x3+4x4+10x5+21x6+38x7+62x8+91x9+122x10+148x11+167x12+172x13+160x14+131x15+94x16+54x17+21x18).
For example, the chance that the top three dice sum to 14 is the coefficient of x14, equal to
6−4×160=10/81=0.123456790123456….
It is in perfect agreement with the probabilities quoted in the question.
By the way, the mean (as calculated from this result) is 15869/1296≈12.244598765… and the standard deviation is √13612487/1679616≈2.8468444.
A similar (unoptimized) calculation for n=400 dice instead of n=4 took less than a half a second, supporting the contention that this is not a computationally demanding algorithm. Here is a plot of the main part of the distribution:
Since the minimum K is highly likely to equal 1 and the sum X will be extremely close to having a Normal(400×7/2,400×35/12) distribution (whose mean is 1400 and standard deviation is approximately 34.1565), the mean must be extremely close to 1400−1=1399 and the standard deviation extremely close to 34.16. This nicely describes the plot, indicating it is likely correct. In fact, the exact calculation gives a mean of around 2.13×10−32 greater than 1399 and a standard deviation around 1.24×10−31 less than √400×35/12.
Attribution
Source : Link , Question Author : SkySpiral7 , Answer Author : whuber