# How can the central limit theorem hold for distributions which have limits on the random variable?

I’ve always taken issue with, and never been given a good answer, for how it is possible that the central limit theorem – the classical version where the distribution of sample means approaches normality – can apply to say a Poisson or Gamma distribution, where $P(x<0)=0$. Or, for that matter, any other distribution for which $\exists X:X \neq -\infty ,F(X)=0$, or perhaps $\exists X:X \neq \infty, 1-F(X)=0$.

As an example, given a Gamma distribution, as the number of samples $n \rightarrow \infty$, $P( \bar{X} = \alpha) \rightarrow 1$, $\forall \alpha \geq 0$, for some $\bar{X}_i$. But if $\alpha<0$, $P(\bar{X}=\alpha)=0$. There simply will never, EVER be an $\bar{X}_i<0$. This suggests to me that the distribution of $\bar{X}$ cannot be, nor approach, normality because $f(\bar{X})$ must necessarily be $0$, $\forall \bar{X}<0$, which does not meet the requirements of a normal distribution where $f(y)>0, \forall y \in R$.

I’d feel much better about life and anything based on the CLT if someone could help me understand where my logic has gone astray.

This is an excellent question, since it shows that you are thinking about the intuitive aspects of the theorems you are learning. That puts you ahead of most students who learn the CLT. Here I will try to supply you with an explanation for how it is possible for the CLT to hold for random variables with restricted support.

The classical central limit theorem applies to any sequence $$X_1, X_2, X_3, … \sim \text{IID Dist}(\mu, \sigma^2)$$ consisting of independent and identically distributed random variables with arbitrary mean $$\mu$$ and finite non-zero variance $$0 < \sigma^2 < \infty$$. Now, suppose that you have such a sequence, and they are bounded by $$x_{\text{min}} \leqslant X_i \leqslant x_{\text{max}}$$, and therefore their support does not cover the whole real line.

The central limit theorem relates to the distribution of the sample mean $$\bar{X}_n \equiv \tfrac{1}{n} \sum_{i=1}^n X_i$$, and from the restricted support on the underlying random variables in the sequence, this statistic must also obey the bounds $$x_{\text{min}} \leqslant \bar{X}_n \leqslant x_{\text{max}}$$. So, the plot thickens – the sample mean that is the subject of the theorem is also bounded! How can the CLT hold if this is the case?

Central Limit Theorem (CLT): Letting $$\Phi$$ be the standard normal distribution function, we have:

$$\lim_{n \rightarrow \infty} \mathbb{P} \Big( \frac{\bar{X}_n – \mu}{\sigma / \sqrt{n}} \leqslant z \Big) = \Phi (z).$$

Approximation arising from CLT: For large $$n$$ we have the approximate distribution:

$$\bar{X}_n \sim \text{N} \Big( \mu, \frac{\sigma^2}{n} \Big).$$

Your issue stems from the fact that the distributional approximation arising out of this theorem approximates a distribution with bounded support by one with unbounded support, and hence, it cannot be correct. You are right about that — the distributional approximation for large $$n$$ is only an approximation, and it does indeed mis-specify the probability that the sample mean is outside its bounds (by giving this positive probability).

However, the CLT is not a statement about a distributional approximation for finite $$n$$. It is about the limiting distribution of the standardised sample mean. The bounds on this quantity are:

$$z_{\text{min}} = \frac{x_{\text{min}} – \mu}{\sigma / \sqrt{n}} \leqslant \frac{\bar{X}_n – \mu}{\sigma / \sqrt{n}} \leqslant \frac{x_{\text{max}} – \mu}{\sigma / \sqrt{n}} = z_{\text{max}}.$$

For any finite sample size, the normal approximation gives a non-zero probability to values outside the support (which of course have a true probability of zero):

\begin{align} P_n^\text{(erroneous)} &\equiv \mathbb{P}(\bar{X}_n \notin [x_\min, x_\max] | \text{Normal Approx}) \\[6pt] &= 1 – \Phi(z_\max) + \Phi(z_\min). \\[6pt] \end{align}

Now, as $$n \rightarrow \infty$$ we have limits $$z_{\text{min}} \rightarrow – \infty$$ and $$z_{\text{max}} \rightarrow \infty$$ which means that the bounds of the standardised sample mean become wider and wider and converge in the limit to the whole real line. (Or to put it slightly more formally, for any point in the real line, the bounds will come to encompass that point for some sufficiently large $$n$$.) A consequence of this is that the probability ascribed to the parts outside the bounds by the normal distribution converges to zero as $$n \rightarrow \infty$$. That is, we have $$\lim_{n \rightarrow \infty} P_n^\text{(erroneous)} = 0$$.

Here we get at the heart of the issue regarding your misgivings about the CLT. It is true that for any finite $$n$$, a normal approximation to the distribution of the sample mean will give positive probability to subsets of values that are outside the bounds of the true support. However, when we take the limit $$n \rightarrow \infty$$ this erroneous positive probability converges to zero. The distributional approximation to the standardised sample mean converges to the true distribution of this quantity in the limit, even though the approximation does not hold exactly for finite $$n$$.

Using some statistical kung-fu to improve the approximation: You are right to have misgivings about the fact that the normal approximation from the CLT gives an erroneous non-zero probability to values outside the bounds of the true distribution. Is there anything that can be done about this?

Well, it turns out there is. You see, the normal distribution is not the only approximating distribution that arises from the CLT. In fact, any sequence of distributions that converges to the normal can also be used for the approximation. This is extremely useful in cases where you have a quantity that is known to have bounded support, and you also want to approximate its distribution with the CLT.

As an example, suppose you are interested in the scaled sample variance $$S_n^2/\sigma^2$$ for large $$n$$ (see related questions here and here). This quantity is always non-negative, yet it obeys a CLT result that says that its distribution converges to the normal distribution (so long as the kurtosis of the underlying population is finite). So, for large $$n$$ you can use the CLT to get the (not particularly wonderful) approximating distribution:

$$\frac{S_N^2}{\sigma^2} \overset{\text{Approx}}{\sim} \text{N} \Bigg( 1, \frac{1}{n} \bigg( \kappa – \frac{n-3}{n-1} \bigg) \Bigg),$$

which gives an erroneous non-zero probability to the negative values. However, following an alternative method used in O’Neill (2014) (Result 14, p. 285) you can use the asymptotically equivalent (and now wonderful) approximating distribution:

$$\frac{S_N^2}{\sigma^2} \overset{\text{Approx}}{\sim} \frac{\text{ChiSq} (DF_n)}{DF_n} \quad \quad \quad \quad \quad DF_n \equiv \frac{2n}{\kappa – (n-3)/(n-1)},$$

which reduces to the exact distribution for an underlying normal population, and does not give positive probability to the (impossible) negative values. Other asymptotically equivalent approximating distributions are also possible, so the point here is that the CLT always gives you a range of available asymptotic distributions, and we can choose the one that has other good properties (e.g., not giving positive probability to impossible values).