Do low-discrepancy sequences work in discrete spaces?

Low-discrepancy sequences in a real space ($[0,1]^n$) seem like a really excellent tool for evenly sampling a sample space. As far as I can tell, they generalise well to any real space, if you use an appropriate map (eg. $[0,1]\to[a,b]$ linear map).

Do such sequences generalise to discrete spaces? eg. if I have a space that has only two elements in each dimension (eg. boolean switches), can I just map $[0,0.5]\to 0;\ (0.5,1]\to 1$? What about for dimensions with more elements (eg. a 4-state switch?). And for spaces with different number of states in each dimension?

My intuition says that this might work ok, especially if the sub sequence is longish, but that it might work better for some sequences than for others, depending on the number of states (eg. a Halton sequence might have odd interactions with a dimension with a prime number of states, or a Sobol’ sequence might only work for dimensions with $2^n$ elements). But I have done no testing.

If this doesn’t work, why not?

Answer

The short answer is: Yes! It can work, and is as simple as multiplying the vector $t_n \in (0,1)^d$ by an integer $m$, and the taking the integer part of each of its components.

The longer answer is that your intuition is correct, that in practice it has mixed results depending on the choice of:

  • which sequence you choose (Halton, Sobol, etc..)
  • the basis parameters (eg, 2,3,5,…)
  • and to a lesser degree, the value of $m$.

However, I have recently written a detailed blog post “The unreasonable effectiveness of quasirandom sequences, on how to easily create an open-ended low discrepancy sequence in arbitrary dimensions, that is much more amenable to discretization than existing existing low discrepancy sequences, such as the Halton and Kronecker sequences.

The section in the post called “Covering” specifically deals with your question of discretizing the low discrepancy sequences.

In the following image squares (which indicates a unique integer latttice point) with less red implies a more even distribution, as each red square indicates that the cell does not contain a blue point. One can clearly see how even the $R$-sequence distributes points compared to other contemporary methods.

Image: Discrete Low discrepancy Sequences in two dimensions

The solution is an additive recurrence method (modulo 1) which generalizes the 1-Dimensional problem whose solution depends on the Golden Ratio. The solution to the $d$-dimensional problem, depends on a special constant $\phi_d$, where $\phi_d$ is the value of smallest, positive real-value of $x$ such that
$$ x^{d+1}\;=x+1$$

For $d=1$,  $ \phi_1 = 1.618033989… $, which is the canonical golden ratio.

For $d=2$, $ \phi_2 = 1.3247179572… $, which  is often called the plastic constant, and has some beautiful properties. This value was conjectured to most likely be the optimal value for a related two-dimensional problem [Hensley, 2002].

Jacob Rus has posted a beautiful visualization of this 2-dimensional low discrepancy sequence, which can be found here.

With this special constant in hand, the calculation of the $n$-th term is now extremely simple and fast to calculate:

$$ R: \mathbf{t}_n = \pmb{\alpha}_0 + n \pmb{\alpha} \; (\textrm{mod} \; 1),  \quad n=1,2,3,… $$
$$ \textrm{where} \quad \pmb{\alpha} =(\frac{1}{\phi_d}, \frac{1}{\phi_d^2},\frac{1}{\phi_d^3},…\frac{1}{\phi_d^d}), $$

Of course, the reason this is called a recurrence sequence is because the above definition is equivalent to
$$ R: \mathbf{t}_{n+1} = \mathbf{t}_{n} + \pmb{\alpha} \; (\textrm{mod} \; 1) $$

In nearly all instances, the choice of $\pmb{\alpha}_0 $ does not change the key characteristics, and so for reasons of obvious simplicity, $\pmb{\alpha}_0 =\pmb{0}$ is the usual choice. However, there are some arguments, relating to symmetry, that suggest that $\pmb{\alpha}_0=\pmb{1/2}$ is a better choice.

The Python code is

# Use Newton-Rhapson-Method
def gamma(d):
    x=1.0000
    for i in range(20):
        x = x-(pow(x,d+1)-x-1)/((d+1)*pow(x,d)-1)
    return x

d=5
n=1000

# m can be any number.
# In the diagram above it is chosen to be exactly sqrt of n, 
# simply to to make the visualization more intuitive 
# so that ideally each cell should have exactly one dot.
m=10

g = gamma(d)
alpha = np.zeros(d)                 
for j in range(d):
    alpha[j] = pow(1/g,j+1) %1
z = np.zeros((n, d))    
c = (np.zeros((n,d)).astype(int)  

for i in range(n):
    z = (0.5 + alpha*(i+1)) %1
    c = (np.floor(m *z)).astype(int)
print(c)

Hope that helps!

Attribution
Source : Link , Question Author : naught101 , Answer Author : Martin Roberts

Leave a Comment