Law of Large Numbers for whole distributions

I’m aware of the Law(s) of Large Numbers, concerning the means. However, intuitively, I’d expect not just the mean, but also the observed relative frequencies (or the histogram, if we have a continuous distribution) to approach the theoretical PMF/PDF as the number of trials goes into infinity.

  1. Is my intuition wrong? Always or only for some degenerate cases (e.g. Cauchy)?
  2. If not, is there a special name for that law?

Answer

While the law of large numbers is framed in terms of “means” this actually gives you a large amount of flexibility to show convergence of other types of quantities. In particular, you can use indicator functions to get convergence results for the probabilities of any specified event. To see how to do this, suppose we start with a sequence X_1,X_2,X_3 ,… \sim \text{IID } F_X and note that the law of large numbers says that (in various probabilistic senses) we have the following convergence:

\frac{1}{n} \sum_{i=1}^n X_i \rightarrow \mathbb{E}(X)
\quad \quad \quad \quad \quad
\text{as } n \rightarrow \infty.

In the sections below I will show how you can use this basic result to show that the empirical CDF converges to the true CDF of the underlying distribution in certain useful senses. This will also show you how the law of large numbers can be applied in a creative way to prove convergence results for other things that don’t look like “means” of quantities (but actually are).


Pointwise convergence of the empirical CDF to the true CDF: In your question you are interested in the convergence of the empirical distribution function to the true distribution function F_X. Let’s start by looking at a particular point x by examining the sequence of values Y_1,Y_2,Y_3 ,… defined by Y_i \equiv \mathbb{I}(X_i \leqslant x). This latter sequence is also IID, so the law of large numbers says that (in various probabilistic senses) we have the following convergence:

\frac{1}{n} \sum_{i=1}^n Y_i \rightarrow \mathbb{E}(Y)
\quad \quad \quad \quad \quad
\text{as } n \rightarrow \infty.

Now, at the point x the empirical distribution function for the sequence \mathbf{X} and the true CDF for the distribution can be written respectively as:

\begin{align}
\hat{F}_n(x)
&\equiv \frac{1}{n} \sum_{i=1}^n \mathbb{I}(X_i \leqslant x)
= \frac{1}{n} \sum_{i=1}^n Y_i, \\[12pt]
F_X(x)
&\equiv \mathbb{P}(X_i \leqslant x)
= \mathbb{E}(Y). \\[6pt]
\end{align}

(The latter result follows from the fact that \mathbb{E}(Y) = \mathbb{P}(Y=1) for any indicator variable Y.) We can therefore re-frame the previous convergence statement from the law of large numbers to give the pointwise convergence result:

\hat{F}_n(x) \rightarrow F_X(x)
\quad \quad \quad \quad \quad
\text{as } n \rightarrow \infty.

You can see that this demonstrates that the empirical CDF converges pointwise to the true CDF for IID data; this is a direct consequence of the law of large numbers. Specifically, the weak law of large numbers establishes pointwise convergence in probability, and the strong law of large numbers establishes pointwise convergence almost surely.


Uniform convergence of the empirical CDF to the true CDF: To go further than the above result, you need to use the uniform law of large numbers—or some other similar theorem—to establish uniform convergence of the empirical CDF to the true CDF. If you use the uniform law of large numbers then you can establish uniform convergence of the empirical CDF under some restrictive assumptions on the underlying CDF. However, there is actually a stronger theorem called the Glivenko–Cantelli theorem that establishes uniform convergence of the empirical CDF to the true CDF (almost surely) for any IID sequence of data. That is, the theorem proves that:

\sup_x | \hat{F}_n(x) – F_X(x) | \overset{\text{a.s}}{\rightarrow} 0
\quad \quad \quad \quad \quad
\text{as } n \rightarrow \infty.

If you would like to learn more about this part, it is worth having a look at the proofs of the uniform law of large numbers and the Glivenko–Cantelli theorem to see how each of them work to establish uniform convergence. The former theorem is broader, but it comes with some restrictions on the input function. The latter theorem applies specifically to the empirical CDF of IID data, but it establishes uniform convergence (almost surely) without any additional assumptions.

Attribution
Source : Link , Question Author : Igor F. , Answer Author : Scortchi – Reinstate Monica

Leave a Comment