# Why don’t asymptotically consistent estimators have zero variance at infinity?

I know that the statement in question is wrong because estimators cannot have asymptotic variances that are lower than the Cramer-Rao bound.

However, if asymptotic consistence means that an estimator converges in probability to a value, then doesn’t this also mean that its variance becomes 0?

Where in this train of thought am I wrong?

Convergence of a sequence of random variables in probability does not imply convergence of their variances, nor even that their variances get anywhere near $$0.$$ In fact, their means may converge to a constant yet their variances can still diverge.

### Examples and counterexamples

Construct counterexamples by creating ever more rare events that are increasingly far from the mean: the squared distance from the mean can overwhelm the decreasing probability and cause the variance to do anything (as I will proceed to show).

For instance, scale a Bernoulli$$(1/n)$$ variate by $$n^{p}$$ for some power $$p$$ to be determined. That is, define the sequence of random variables $$X_n$$ by

\begin{aligned} &\Pr(X_n=n^{p})=1/n \\ &\Pr(X_n=0)= 1 – 1/n. \end{aligned}

As $$n\to \infty$$, because $$\Pr(X_n=0)\to 1$$ this converges in probability to $$0;$$ its expectation $$n^{p-1}$$ even converges to $$0$$ provided $$p\lt 1;$$ but for $$p\gt 1/2$$ its variance $$n^{2p-1}(1-1/n)$$ diverges.

Many other behaviors are possible:

• Because negative powers $$2p-1$$ of $$n$$ converge to $$0,$$ the variance
converges to $$0$$ for $$p\lt 1/2:$$ the variables “squeeze down” to $$0$$
in some sense.

• An interesting edge case is $$p=1/2,$$ for which the variance converges
to $$1.$$

• By varying $$p$$ above and below $$1/2$$ depending on $$n$$ you can even
make the variance not converge at all. For instance, let $$p(n)=0$$
for even $$n$$ and $$p(n)=1$$ for odd $$n.$$

### A direct connection with estimation

Finally, a reasonable possible objection is that abstract sequences of random variables are not really “estimators” of anything. But they can nevertheless be involved in estimation. For instance, let $$t_n$$ be a sequence of statistics, intended to estimate some numerical property $$\theta(F)$$ of the common distribution of an (arbitrarily large) iid random sample $$(Y_1,Y_2,\ldots,Y_n,\ldots)$$ of $$F.$$ This induces a sequence of random variables

$$T_n = t_n(Y_1,Y_2,\ldots,Y_n).$$

Modify this sequence by choosing any value of $$p$$ (as above) you like and set

$$T^\prime_n = T_n + (X_n – n^{p-1}).$$

The parenthesized term makes a zero-mean adjustment to $$T_n,$$ so that if $$T_n$$ is a reasonable estimator of $$\theta(F),$$ then so is $$T^\prime_n.$$ (With some imagination we can conceive of situations where $$T_n^\prime$$ could yield better estimates than $$T_n$$ with probability close to $$1.$$) However, if you make the $$X_n$$ independent of $$Y_1,\ldots, Y_n,$$ the variance of $$T^\prime_n$$ will be the sum of the variances of $$T_n$$ and $$X_n,$$ which you thereby can cause to diverge.