Why don’t asymptotically consistent estimators have zero variance at infinity?

I know that the statement in question is wrong because estimators cannot have asymptotic variances that are lower than the Cramer-Rao bound.

However, if asymptotic consistence means that an estimator converges in probability to a value, then doesn’t this also mean that its variance becomes 0?

Where in this train of thought am I wrong?


Convergence of a sequence of random variables in probability does not imply convergence of their variances, nor even that their variances get anywhere near $0.$ In fact, their means may converge to a constant yet their variances can still diverge.

Examples and counterexamples

Construct counterexamples by creating ever more rare events that are increasingly far from the mean: the squared distance from the mean can overwhelm the decreasing probability and cause the variance to do anything (as I will proceed to show).

For instance, scale a Bernoulli$(1/n)$ variate by $n^{p}$ for some power $p$ to be determined. That is, define the sequence of random variables $X_n$ by

&\Pr(X_n=n^{p})=1/n \\
&\Pr(X_n=0)= 1 – 1/n.

As $n\to \infty$, because $\Pr(X_n=0)\to 1$ this converges in probability to $0;$ its expectation $n^{p-1}$ even converges to $0$ provided $p\lt 1;$ but for $p\gt 1/2$ its variance $n^{2p-1}(1-1/n)$ diverges.


Many other behaviors are possible:

  • Because negative powers $2p-1$ of $n$ converge to $0,$ the variance
    converges to $0$ for $p\lt 1/2:$ the variables “squeeze down” to $0$
    in some sense.

  • An interesting edge case is $p=1/2,$ for which the variance converges
    to $1.$

  • By varying $p$ above and below $1/2$ depending on $n$ you can even
    make the variance not converge at all. For instance, let $p(n)=0$
    for even $n$ and $p(n)=1$ for odd $n.$

A direct connection with estimation

Finally, a reasonable possible objection is that abstract sequences of random variables are not really “estimators” of anything. But they can nevertheless be involved in estimation. For instance, let $t_n$ be a sequence of statistics, intended to estimate some numerical property $\theta(F)$ of the common distribution of an (arbitrarily large) iid random sample $(Y_1,Y_2,\ldots,Y_n,\ldots)$ of $F.$ This induces a sequence of random variables

$$T_n = t_n(Y_1,Y_2,\ldots,Y_n).$$

Modify this sequence by choosing any value of $p$ (as above) you like and set

$$T^\prime_n = T_n + (X_n – n^{p-1}).$$

The parenthesized term makes a zero-mean adjustment to $T_n,$ so that if $T_n$ is a reasonable estimator of $\theta(F),$ then so is $T^\prime_n.$ (With some imagination we can conceive of situations where $T_n^\prime$ could yield better estimates than $T_n$ with probability close to $1.$) However, if you make the $X_n$ independent of $Y_1,\ldots, Y_n,$ the variance of $T^\prime_n$ will be the sum of the variances of $T_n$ and $X_n,$ which you thereby can cause to diverge.

Source : Link , Question Author : Heisenberg , Answer Author : whuber

Leave a Comment