sampling cost of O(d)O(d) versus O(2d)O(2^d)

I came across the following simulation problem: given a set {ω1,,ωd} of known real numbers, a distribution on {1,1}d is defined by
P(X=(x1,,xd))(x1ω1++xdωd)+
where (z)+ denotes the positive part of z. While I can think of a Metropolis-Hastings sampler targetting this distribution, I wonder if there exists an efficient direct sampler, taking advantage of the large number of zero probabilities to decrease the order of the algorithm from O(2d) to O(d).

Answer

Here’s a fairly obvious recursive sampler that’s O(d) in the best case (in terms of the weights ωi), but exponential in the worst case.

Suppose we’ve already selected x1,,xi1, and wish to choose xi. We need to compute
w(x1,,xi1,xi)=xi+1{1,1}xd{1,1}(dj=1ωjxj)+
and choose xi=1 with probability w(x1,,xi1,1)w(x1,,xi1,1)+w(x1,,xi1,1).
The denominator will be nonzero for any valid choice of samples x1,,xi1.


Now, of course, the question is how to compute w(x1,,xi).

If we have that C:=ij=1ωjxjdj=i+1|ωj|, then ωx0 for any x with leading entries x1:i, and so w becomes:
xi+1xdωx=ω(xi+1xdx)=ij=1ωj(xi+1xdxj)2dixj+dj=i+1ωj(xi+1xdxj)0=2diC.

In the opposite case, Cdj=i+1|ωj|,
we have that ωx0 and so w(x1,,xi)=0.

Otherwise, we must recurse, using w(x1,,xi)=w(x1,,xi,1)+w(x1,,xi,1).

Assume that memory isn’t an issue and that we can cache all sub-computations in w(1), w(1) in a tree – up to the point that we hit one of the “nice” cases, after which any calls take constant time. (We’ll need to compute this whole tree anyway to select x1.) Then, once this tree of w computations is built, the sampler will take only O(d) time. The question is how long it takes to build the tree, or equivalently how large it is.


We will of course hit the “nice” cases faster if the ωi are sorted, ω1ω2ωd.

In the best case, |ω1|>dj=2|ωj|. Then we hit a “nice” case immediately for either w(1) or w(1), so w tree construction takes constant time, and the whole sampler takes O(d) time.

In the worst (sorted) case, ω1=ω2==ωd. Then the question is: how big is the total tree?

Well, the first paths to terminate are of course (1,1,,1) and (1,1,,1) of length d/2. The tree is therefore complete up to that depth, and so contains at least O(2d/2) nodes. (It has more; you can probably find it with an argument like the ones used in gambler’s ruin problems, but I couldn’t find it in two minutes of Googling and don’t particularly care – 2d/2 is bad enough….)

If your setting has only a few very large ωi, this is probably a reasonably practical approach. If the ωi are all of similar magnitude, it’s probably still exponential and too expensive for large d.

Attribution
Source : Link , Question Author : Xi’an , Answer Author : Xi’an

Leave a Comment