Global randomness and local randomness are different. Most
philosophical conceptions of randomness are global—because they are
based on the idea that “in the long run” a sequence looks truly
random, even if certain sub-sequences would not look random. In a
“truly” random sequence of numbers of sufficient length, for example,
it is probable there would be long sequences of nothing but zeros,
though on the whole the sequence might be random. Local randomness
refers to the idea that there can be minimum sequence lengths in which
random distributions are approximated. Long stretches of the same
digits, even those generated by “truly” random processes, would
diminish the “local randomness” of a sample (it might only be locally
random for sequences of 10,000 digits; taking sequences of less than
1,000 might not appear random at all, for example).
A sequence exhibiting a pattern is not thereby proved not statistically random. According to principles of Ramsey theory,
sufficiently large objects must necessarily contain a given
substructure (“complete disorder is impossible”).
I don’t quite understand the meanings of the two sentences in bold.
Does the first sentence mean that something makes a sequence local
random at a longer length, and not local random at a shorter length?
How does the example inside the parenthesis work?
Does the second sentence mean that a sequence exhibiting a pattern
cannot be proved to be not statistically random? Why?
The concept can be neatly illustrated by some executable code. We begin (in
R) by using a good pseudo random number generator to create a sequence of 10,000 zeros and ones:
set.seed(17) x <- floor(runif(10000, min=0, max=2))
This passes some basic random number tests. For instance, a t-test to compare the mean to $1/2$ has a p-value of $40.09$%, which allows us to accept the hypothesis that zeros and ones are equally likely.
From these numbers we proceed to extract a subsequence of $1000$ successive values starting at the 5081st value:
x0 <- x[1:1000 + 5080]
If these are to look random, they should also pass the same random number tests. For instance, let’s test whether their mean is 1/2:
> t.test(x0-1/2) One Sample t-test data: x0 - 1/2 t = 2.6005, df = 999, p-value = 0.009445 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 0.01006167 0.07193833 sample estimates: mean of x 0.041
The low p-value (less than 1%) strongly suggests the mean is significantly greater than $1/2$. Indeed, the cumulative sum of this subsequence has a strong upward trend:
That’s not random behavior!
Comparing the original sequence (plotted as a cumulative sum) to this subsequence reveals what’s going on:
The long sequence indeed behaves like a random walk–as it should–but the particular subsequence I extracted contains the longest upward rise among all subsequences of the same length. It looks like I could have extracted some other subsequences exhibiting “nonrandom” behavior, too, such as the one centered around $9000$ where approximately 20 ones in a row appear!
As these simple analyses have shown, no test can “prove” that a sequence appears random. All we can do is test whether sequences deviate enough from the behaviors expected of random sequences to offer evidence that they are not random. This is how batteries of random-number tests work: they look for patterns highly unlikely to arise in random number sequences. Every once in a long while they will cause us to conclude that a truly random sequence of numbers does not appear random: we will reject it an try something else.
In the long run, though–just as we are all dead–any truly random number generator will generate every possible sequence of 1000 digits, and it will do so infinitely many times. What rescues us from a logical quandary is that we would have to wait an awfully long time for such an apparent aberration to occur.