Probability of a run of k successes in a sequence of n Bernoulli trials

I’m trying to find the probability of getting 8 trials in a row correct in a block of 25 trials, you have 8 total blocks (of 25 trials) to get 8 trials correct in a row. The probability of getting any trial correct based on guessing is 1/3, after getting 8 in a row correct the blocks will end (so getting more than 8 in a row correct is technically not possible). How would I go about finding the probability of this occurring? I’ve been thinking along the lines of using (1/3)^8 as the probability of getting 8 in a row correct, there are 17 possible chances to get 8 in a row in a block of 25 trials, if I multiply 17 possibilities * 8 blocks I get 136, would 1-(1-(1/3)^8)^136 give me the likelihood of getting 8 in a row correct in this situation or am I missing something fundamental here?

Answer

By keeping track of things you can get an exact formula.

Let p=1/3 be the probability of success and k=8 be the number of successes in a row you want to count. These are fixed for the problem. Variable values are m, the number of trials left in the block; and j, the number of successive successes already observed. Let the chance of eventually achieving k successes in a row before m trials are exhausted be written fp,k(j,m). We seek f1/3,8(0,25).

Suppose we have just seen our jth success in a row with m>0 trials to go. The next trial is either a success, with probability p–in which case j is increased to j+1–; or else it is a failure, with probability 1p–in which case j is reset to 0. In either case, m decreases by 1. Whence

fp,k(j,m)=pfp,k(j+1,m1)+(1p)fp,k(0,m1).

As starting conditions we have the obvious results fp,k(k,m)=1 for m0 (i.e., we have already seen k in a row) and fp,k(j,m)=0 for kj>m (i.e., there aren’t enough trials left to get k in a row). It is now fast and straightforward (using dynamic programming or, because this problem’s parameters are so small, recursion) to compute

fp,8(0,25)=18p817p945p16+81p1736p18.

When p=1/3 this yields 80897/430467210.0018793.

Relatively fast R code to simulate this is

hits8 <- function() {
    x <- rbinom(26, 1, 1/3)                # 25 Binomial trials
    x[1] <- 0                              # ... and a 0 to get started with `diff`
    if(sum(x) >= 8) {                      # Are there at least 8 successes?
        max(diff(cumsum(x), lag=8)) >= 8   # Are there 8 successes in a row anywhere?
    } else {
        FALSE                              # Not enough successes for 8 in a row
    }
}
set.seed(17)
mean(replicate(10^5, hits8()))

After 3 seconds of calculation, the output is 0.00213. Although this looks high, it’s only 1.7 standard errors off. I ran another 106 iterations, yielding 0.001867: only 0.3 standard errors less than expected. (As a double-check, because an earlier version of this code had a subtle bug, I also ran 400,000 iterations in Mathematica, obtaining an estimate of 0.0018475.)

This result is less than one-tenth the estimate of 1(1(1/3)8)1360.0205 in the question. But perhaps I have not fully understood it: another interpretation of “you have 8 total blocks … to get 8 trials correct in a row” is that the answer being sought equals 1(1f1/3,8(0,25))8)=0.0149358….

Attribution
Source : Link , Question Author : AcidNynex , Answer Author : whuber

Leave a Comment