I want to sample according to a density
where c and d are strictly positive.
(Motivation: This could be useful for Gibbs sampling when the shape parameter of a Gamma density has a uniform prior.)
Does anyone know how to sample from this density easily? Maybe it is standard and just something I don’t know about?
I can think of a stupid rejection sampliing algorithm that will more or less work (find the mode a∗ of f, sample (a,u) from uniform in a big box [0,10a∗]×[0,f(a∗)] and reject if u>f(a)), but (i) it is not at all efficient and (ii) f(a∗) will be too big for a computer to handle easily for even moderately large c and d. (Note that the mode for large c and d is approximately at a=cd.)
Thanks in advance for any help!
Rejection sampling will work exceptionally well when cd≥exp(5) and is reasonable for cd≥exp(2).
To simplify the math a little, let k=cd, write x=a, and note that
for x≥1. Setting x=u3/2 gives
for u≥1. When k≥exp(5), this distribution is extremely close to Normal (and gets closer as k gets larger). Specifically, you can
Find the mode of f(u) numerically (using, e.g., Newton-Raphson).
Expand logf(u) to second order about its mode.
This yields the parameters of a closely approximate Normal distribution. To high accuracy, this approximating Normal dominates f(u) except in the extreme tails. (When k<exp(5), you may need to scale the Normal pdf up a little bit to assure domination.)
Having done this preliminary work for any given value of k, and having estimated a constant M>1 (as described below), obtaining a random variate is a matter of:
Draw a value u from the dominating Normal distribution g(u).
If u<1 or if a new uniform variate X exceeds f(u)/(Mg(u)), return to step 1.
The expected number of evaluations of f due to the discrepancies between g and f is only slightly greater than 1. (Some additional evaluations will occur due to rejections of variates less than 1, but even when k is as low as 2 the frequency of such occurrences is small.)
This plot shows the logarithms of g and f as a function of u for k=exp(5). Because the graphs are so close, we need to inspect their ratio to see what’s going on:
This displays the log ratio log(exp(0.004)g(u)/f(u)); the factor of M=exp(0.004) was included to assure the logarithm is positive throughout the main part of the distribution; that is, to assure Mg(u)≥f(u) except possibly in regions of negligible probability. By making M sufficiently large you can guarantee that M⋅g dominates f in all but the most extreme tails (which have practically no chance of being chosen in a simulation anyway). However, the larger M is, the more frequently rejections will occur. As k grows large, M can be chosen very close to 1, which incurs practically no penalty.
A similar approach works even for k>exp(2), but fairly large values of M may be needed when exp(2)<k<exp(5), because f(u) is noticeably asymmetric. For instance, with k=exp(2), to get a reasonably accurate g we need to set M=1:
The upper red curve is the graph of log(exp(1)g(u)) while the lower blue curve is the graph of log(f(u)). Rejection sampling of f relative to exp(1)g will cause about 2/3 of all trial draws to be rejected, tripling the effort: still not bad. The right tail (u>10 or x>103/2∼30) will be under-represented in the rejection sampling (because exp(1)g no longer dominates f there), but that tail comprises less than exp(−20)∼10−9 of the total probability.
To summarize, after an initial effort to compute the mode and evaluate the quadratic term of the power series of f(u) around the mode–an effort that requires a few tens of function evaluations at most–you can use rejection sampling at an expected cost of between 1 and 3 (or so) evaluations per variate. The cost multiplier rapidly drops to 1 as k=cd increases beyond 5.
Even when just one draw from f is needed, this method is reasonable. It comes into its own when many independent draws are needed for the same value of k, for then the overhead of the initial calculations is amortized over many draws.
@Cardinal has asked, quite reasonably, for support of some of the hand-waving analysis in the forgoing. In particular, why should the transformation x=u3/2 make the distribution approximately Normal?
In light of the theory of Box-Cox transformations, it is natural to seek some power transformation of the form x=uα (for a constant α, hopefully not too different from unity) that will make a distribution “more” Normal. Recall that all Normal distributions are simply characterized: the logarithms of their pdfs are purely quadratic, with zero linear term and no higher order terms. Therefore we can take any pdf and compare it to a Normal distribution by expanding its logarithm as a power series around its (highest) peak. We seek a value of α that makes (at least) the third power vanish, at least approximately: that is the most we can reasonably hope that a single free coefficient will accomplish. Often this works well.
But how to get a handle on this particular distribution? Upon effecting the power transformation, its pdf is
Take its logarithm and use Stirling’s asymptotic expansion of log(Γ):
(for small values of c, which is not constant). This works provided α is positive, which we will assume to be the case (for otherwise we cannot neglect the remainder of the expansion).
Compute its third derivative (which, when divided by 3!, will be the coefficient of the third power of u in the power series) and exploit the fact that at the peak, the first derivative must be zero. This simplifies the third derivative greatly, giving (approximately, because we are ignoring the derivative of c)
When k is not too small, u will indeed be large at the peak. Because α is positive, the dominant term in this expression is the 2α power, which we can set to zero by making its coefficient vanish:
That’s why α=3/2 works so well: with this choice, the coefficient of the cubic term around the peak behaves like u−3, which is close to exp(−2k). Once k exceeds 10 or so, you can practically forget about it, and it’s reasonably small even for k down to 2. The higher powers, from the fourth on, play less and less of a role as k gets large, because their coefficients grow proportionately smaller, too. Incidentally, the same calculations (based on the second derivative of log(f(u)) at its peak) show the standard deviation of this Normal approximation is slightly less than 23exp(k/6), with the error proportional to exp(−k/2).