# How to sample from cada−1/Γ(a)c^a d^{a-1} / \Gamma(a)?

I want to sample according to a density

where $c$ and $d$ are strictly positive.
(Motivation: This could be useful for Gibbs sampling when the shape parameter of a Gamma density has a uniform prior.)

Does anyone know how to sample from this density easily? Maybe it is standard and just something I don’t know about?

I can think of a stupid rejection sampliing algorithm that will more or less work (find the mode $a^*$ of $f$, sample $(a,u)$ from uniform in a big box $[0,10a^*]\times [0,f(a^*)]$ and reject if $u>f(a)$), but (i) it is not at all efficient and (ii) $f(a^*)$ will be too big for a computer to handle easily for even moderately large $c$ and $d$. (Note that the mode for large $c$ and $d$ is approximately at $a=cd$.)

Thanks in advance for any help!

Rejection sampling will work exceptionally well when $$cd≥exp(5)c d \ge \exp(5)$$ and is reasonable for $$cd≥exp(2)c d \ge \exp(2)$$.

To simplify the math a little, let $$k=cdk = c d$$, write $$x=ax = a$$, and note that

$$f(x)∝kxΓ(x)dxf(x) \propto \frac{k^x}{\Gamma(x)} dx$$

for $$x≥1x \ge 1$$. Setting $$x=u3/2x = u^{3/2}$$ gives

$$f(u)∝ku3/2Γ(u3/2)u1/2duf(u) \propto \frac{k^{u^{3/2}}}{\Gamma(u^{3/2})} u^{1/2} du$$

for $$u≥1u \ge 1$$. When $$k≥exp(5)k \ge \exp(5)$$, this distribution is extremely close to Normal (and gets closer as $$kk$$ gets larger). Specifically, you can

1. Find the mode of $$f(u)f(u)$$ numerically (using, e.g., Newton-Raphson).

2. Expand $$logf(u)\log{f(u)}$$ to second order about its mode.

This yields the parameters of a closely approximate Normal distribution. To high accuracy, this approximating Normal dominates $$f(u)f(u)$$ except in the extreme tails. (When $$k, you may need to scale the Normal pdf up a little bit to assure domination.)

Having done this preliminary work for any given value of $$kk$$, and having estimated a constant $$M>1M \gt 1$$ (as described below), obtaining a random variate is a matter of:

1. Draw a value $$uu$$ from the dominating Normal distribution $$g(u)g(u)$$.

2. If $$u<1u \lt 1$$ or if a new uniform variate $$XX$$ exceeds $$f(u)/(Mg(u))f(u)/(M g(u))$$, return to step 1.

3. Set $$x=u3/2x = u^{3/2}$$.

The expected number of evaluations of $$ff$$ due to the discrepancies between $$gg$$ and $$ff$$ is only slightly greater than 1. (Some additional evaluations will occur due to rejections of variates less than $$11$$, but even when $$kk$$ is as low as $$22$$ the frequency of such occurrences is small.)

This plot shows the logarithms of g and f as a function of u for $$k=exp(5)k=\exp(5)$$. Because the graphs are so close, we need to inspect their ratio to see what’s going on:

This displays the log ratio $$log(exp(0.004)g(u)/f(u))\log(\exp(0.004)g(u)/f(u))$$; the factor of $$M=exp(0.004)M = \exp(0.004)$$ was included to assure the logarithm is positive throughout the main part of the distribution; that is, to assure $$Mg(u)≥f(u)Mg(u) \ge f(u)$$ except possibly in regions of negligible probability. By making $$MM$$ sufficiently large you can guarantee that $$M⋅gM \cdot g$$ dominates $$ff$$ in all but the most extreme tails (which have practically no chance of being chosen in a simulation anyway). However, the larger $$MM$$ is, the more frequently rejections will occur. As $$kk$$ grows large, $$MM$$ can be chosen very close to $$11$$, which incurs practically no penalty.

A similar approach works even for $$k>exp(2)k \gt \exp(2)$$, but fairly large values of $$MM$$ may be needed when $$exp(2), because $$f(u)f(u)$$ is noticeably asymmetric. For instance, with $$k=exp(2)k = \exp(2)$$, to get a reasonably accurate $$gg$$ we need to set $$M=1M=1$$:

The upper red curve is the graph of $$log(exp(1)g(u))\log(\exp(1)g(u))$$ while the lower blue curve is the graph of $$log(f(u))\log(f(u))$$. Rejection sampling of $$ff$$ relative to $$exp(1)g\exp(1)g$$ will cause about 2/3 of all trial draws to be rejected, tripling the effort: still not bad. The right tail ($$u>10u \gt 10$$ or $$x>103/2∼30x \gt 10^{3/2} \sim 30$$) will be under-represented in the rejection sampling (because $$exp(1)g\exp(1)g$$ no longer dominates $$ff$$ there), but that tail comprises less than $$exp(−20)∼10−9\exp(-20) \sim 10^{-9}$$ of the total probability.

To summarize, after an initial effort to compute the mode and evaluate the quadratic term of the power series of $$f(u)f(u)$$ around the mode–an effort that requires a few tens of function evaluations at most–you can use rejection sampling at an expected cost of between 1 and 3 (or so) evaluations per variate. The cost multiplier rapidly drops to 1 as $$k=cdk = c d$$ increases beyond 5.

Even when just one draw from $$ff$$ is needed, this method is reasonable. It comes into its own when many independent draws are needed for the same value of $$kk$$, for then the overhead of the initial calculations is amortized over many draws.

@Cardinal has asked, quite reasonably, for support of some of the hand-waving analysis in the forgoing. In particular, why should the transformation $$x=u3/2x = u^{3/2}$$ make the distribution approximately Normal?

In light of the theory of Box-Cox transformations, it is natural to seek some power transformation of the form $$x=uαx = u^\alpha$$ (for a constant $$α\alpha$$, hopefully not too different from unity) that will make a distribution “more” Normal. Recall that all Normal distributions are simply characterized: the logarithms of their pdfs are purely quadratic, with zero linear term and no higher order terms. Therefore we can take any pdf and compare it to a Normal distribution by expanding its logarithm as a power series around its (highest) peak. We seek a value of $$α\alpha$$ that makes (at least) the third power vanish, at least approximately: that is the most we can reasonably hope that a single free coefficient will accomplish. Often this works well.

But how to get a handle on this particular distribution? Upon effecting the power transformation, its pdf is

$$f(u)=kuαΓ(uα)uα−1.f(u) = \frac{k^{u^{\alpha}}}{\Gamma(u^{\alpha})} u^{\alpha-1}.$$

Take its logarithm and use Stirling’s asymptotic expansion of $$log(Γ)\log(\Gamma)$$:

$$log(f(u))≈log(k)uα+(α−1)log(u)−αuαlog(u)+uα−log(2πuα)/2+cu−α\log(f(u)) \approx \log(k) u^\alpha + (\alpha - 1)\log(u) - \alpha u^\alpha \log(u) + u^\alpha - \log(2 \pi u^\alpha)/2 + c u^{-\alpha}$$

(for small values of $$cc$$, which is not constant). This works provided $$α\alpha$$ is positive, which we will assume to be the case (for otherwise we cannot neglect the remainder of the expansion).

Compute its third derivative (which, when divided by $$3!3!$$, will be the coefficient of the third power of $$uu$$ in the power series) and exploit the fact that at the peak, the first derivative must be zero. This simplifies the third derivative greatly, giving (approximately, because we are ignoring the derivative of $$cc$$)

$$−12u−(3+α)α(2α(2α−3)u2α+(α2−5α+6)uα+12cα).-\frac{1}{2} u^{-(3+\alpha)} \alpha \left(2 \alpha(2 \alpha-3) u^{2 \alpha} + (\alpha^2 - 5\alpha +6)u^\alpha + 12 c \alpha \right).$$

When $$kk$$ is not too small, $$uu$$ will indeed be large at the peak. Because $$α\alpha$$ is positive, the dominant term in this expression is the $$2α2\alpha$$ power, which we can set to zero by making its coefficient vanish:

$$2α−3=0.2 \alpha-3 = 0.$$

That’s why $$α=3/2\alpha = 3/2$$ works so well: with this choice, the coefficient of the cubic term around the peak behaves like $$u−3u^{-3}$$, which is close to $$exp(−2k)\exp(-2 k)$$. Once $$kk$$ exceeds 10 or so, you can practically forget about it, and it’s reasonably small even for $$kk$$ down to 2. The higher powers, from the fourth on, play less and less of a role as $$kk$$ gets large, because their coefficients grow proportionately smaller, too. Incidentally, the same calculations (based on the second derivative of $$log(f(u))log(f(u))$$ at its peak) show the standard deviation of this Normal approximation is slightly less than $$23exp(k/6)\frac{2}{3}\exp(k/6)$$, with the error proportional to $$exp(−k/2)\exp(-k/2)$$.