Suppose you define:
where Φ−1 is the inverse of the CDF of the standard normal distribution.
My question is: Is there a simple distribution that Y follows, or that can approximate Y? I’m asking because I have a strong suspicion based on simulation results (shown below) that Y converges to a normal distribution when α and β are high, but I don’t know why it would mathematically. (Of course when α=1;β=1, X would be uniform and Y would be the standard normal, but why would it be true for higher values?).
If this does converge to a normal, what would the parameters of that normal be, in terms of α and β? (I expect the mean would be Φ−1(αα+β) since that’s the transformation of the mode, but I don’t know the standard deviation).
(Put another way, this could be asking “does Φ(Norm(μ,σ)) converge to a beta distribution, for some direction of μ and σ“? I’m not sure whether that’s easier to answer).
Here I show why I have the suspicion that the result is normal (since I can’t back it up with math). Simulation of Y can be done in R with
rnorm. For example, choosing the high parameters α=3000 and β=7000:
hist(qnorm(rbeta(5000, 3000, 7000)))
This does look normal, and
qqnormand the Shapiro-Wilk test (in which normality is the null hypothesis) suggest so as well:
qqnorm(qnorm(rbeta(5000, 3000, 7000)))
shapiro.test(qnorm(rbeta(5000, 3000, 7000))) #> #> Shapiro-Wilk normality test #> #> data: qnorm(rbeta(5000, 3000, 7000)) #> W = 0.99954, p-value = 0.2838
To explore the normality a bit deeper, I perform 2,000 simulations, each time simulating 5,000 values from Y, then performing the test to compare it to the normal. (I chose 5K values because that’s the maximum
shapiro.testcan handle, and maximizes the power to detect deviations from the norm).
If the distribution truly were normal we would expect p-values would be uniform (since the null is true). They are indeed close to uniform, suggesting that the distribution is very close to normal:
hist(replicate(2000, shapiro.test(qnorm(rbeta(5000, 3000, 7000)))$p.value))
Some experimentation shows that the higher α and β are, the close the distribution gets to normal (e.g.
rbeta(5000, 3, 7)is quite far from normal, but try
hist(replicate(2000, shapiro.test(qnorm(rbeta(5000, 30, 70)))$p.value))and it appears to be somewhere in between).
You have rediscovered part of the construction described at Central Limit Theorem for Sample Medians, which illustrates an analysis of the median of a sample. (The analysis obviously applies, mutatis mutandis, to any quantile, not just the median). Therefore it is no surprise that for large Beta parameters (corresponding to large samples) a Normal distribution arises under the transformation described in the question. What is of interest is how close to Normal the distribution is even for small Beta parameters. That deserves an explanation.
I will sketch an analysis below. To keep this post at a reasonable length, it involves a lot of suggestive hand-waving: I aim only to point out the key ideas. Let me therefore summarize the results here:
When α is close to β, everything is symmetric. This causes the transformed distribution already to look Normal.
The functions of the form Φα−1(x)(1−Φ(x))β−1 look fairly Normal in the first place, even for small values of α and β (provided both exceed 1 and their ratio is not too close to 0 or 1).
The apparent Normality of the transformed distribution is due to the fact that its density consists of a Normal density multiplied by a function in (2).
As α and β increase, the departure from Normality can be measured in the remainder terms in a Taylor series for the log density. The term of order n decreases in proportion to the (n−2)/2 powers of α and β. This implies that eventually, for sufficiently large α and β, all terms of power n=3 or greater have become relatively small, leaving only a quadratic: which is precisely the log density of a Normal distribution.
Collectively, these behaviors nicely explain why even for small α and β the non-extreme quantiles of an iid Normal sample look approximately Normal.
Because it can be useful to generalize, let F be any distribution function, although we have in mind F=Φ.
The density function g(y) of a Beta(α,β) variable is, by definition, proportional to
Letting y=F(x) be the probability integral transform of x and writing f for the derivative of F, it is immediate that x has a density proportional to
Because this is a monotonic transformation of a strongly unimodal distribution (a Beta), unless F is rather strange, the transformed distribution will be unimodal, too. To study how close to Normal it might be, let’s examine the logarithm of its density,
where C is an irrelevant constant of normalization.
Expand the components of logG(x;α,β) in Taylor series to order three around a value x0 (which will be close to a mode). For instance, we may write the expansion of logF as
for some h with |h|≤|x−x0|. Use a similar notation for log(1−F) and logf.
The linear term in (1) thereby becomes
When x0 is a mode of G(;α,β), this expression is zero. Note that because the coefficients are continuous functions of x0, as α and β are varied, the mode x0 will vary continuously too. Moreover, once α and β are sufficiently large, the cf1 term becomes relatively inconsequential. If we aim to study the limit as α→∞ and β→∞ for which α:β stays in constant proportion γ, we may therefore once and for all choose a base point x0 for which
A nice case is where γ=1, where α=β throughout, and F is symmetric about 0. In that case it is obvious x0=F(0)=1/2.
We have achieved a method whereby (a) in the limit, the first-order term in the Taylor series vanishes and (b) in the special case just described, the first-order term is always zero.
These are the sum
Comparing to a Normal distribution, whose quadratic term is −(1/2)(x−x0)2/σ2, we may estimate that −1/(2g2(α,β)) is approximately the variance of G. Let us standardize G by rescaling x by its square root. we don’t really need the details; it suffices to understand that this rescaling is going to multiply the coefficient of (x−x0)n in the Taylor expansion by (−1/(2g2(α,β)))n/2.
Here’s the punchline: the term of order n in the Taylor expansion is, according to our notation,
After standardization, it becomes
Both of the gi are affine combination of α and β. By raising the denominator to the n/2 power, the net behavior is of order −(n−2)/2 in each of α and β. As these parameters grow large, then, each term in the Taylor expansion after the second decreases to zero asymptotically. In particular, the third-order remainder term becomes arbitrarily small.
The case when F is normal
The vanishing of the remainder term is particularly fast when F is standard Normal, because in this case f(x) is purely quadratic: it contributes nothing to the remainder terms. Consequently, the deviation of G from normality depends solely on the deviation between Fα−1(1−F)β−1 and normality.
This deviation is fairly small even for small α and β. To illustrate, consider the case α=β. G is symmetric, whence the order-3 term vanishes altogether. The remainder is of order 4 in x−x0=x.
Here is a plot showing how the standardized fourth order term changes with small values of α>1:
The value starts out at 0 for α=β=1, because then the distribution obviously is Normal (Φ−1 applied to a uniform distribution, which is what Beta(1,1) is, gives a standard Normal distribution). Although it increases rapidly, it tops off at less than 0.008–which is practically indistinguishable from zero. After that the asymptotic reciprocal decay kicks in, making the distribution ever closer to Normal as α increases beyond 2.