The issue has come up before, but I want to ask a specific question that will attempt to elicit an answer that will clarify (and classify) it:

In “Poor Man’s Asymptotics”, one keeps a clear distinction between

(a)a sequence of random variables that converges in probability to a constantas contrasted to

(b)a sequence of random variables that converges in probability to a random variable (and hence in distribution to it).But in “Wise Man’s Asymptotics”, we can also have the case of

(c)a sequence of random variables that converges in probability to a constant while maintaining a non-zero variance at the limit.My question is (stealing from my own exploratory answer below):

How can we understand an estimator that is asymptotically consistent butalsohas a non-zero, finite variance? What does this variance reflects? How its behavior differs from a “usual” consistent estimator?Threads related to the phenomenon described in (c) (look also in the comments):

**Answer**

I won’t give a very satisfactory answer to your question because it seems to me to be a little bit too open, but let me try to shed some light on why this question is a hard one.

I think you are struggling with the fact that the conventional topologies we use on probability distributions and random variables are bad. I’ve written a bigger piece about this on my blog but let me try to summarize: you can converge in the weak (and the total-variation) sense while violating commonsensical assumptions about what convergence means.

For example, you can converge in weak topology towards a constant while having variance = 1 (which is exactly what your Z_n sequence is doing). There is then a limit distribution (in the weak topology) that is this monstruous random variable which is most of the time equal to 0 but infinitesimally rarely equal to infinity.

I personally take this to mean that the weak topology (and the total-variation topology too) is a poor notion of convergence that should be discarded. Most of the convergences we actually use are stronger than that. However, I don’t really know what should we use instead of the weak topology sooo …

If you really want to find an essential difference between \hat \theta= \bar X+Z_n and \tilde \theta=\bar X, here is my take: both estimators are equivalent for the [0,1]-loss (when the size of your mistake doesn’t matter). However, \tilde \theta is much better if the size of your mistakes matter, because \hat \theta sometimes fails catastrophically.

**Attribution***Source : Link , Question Author : Alecos Papadopoulos , Answer Author : Alecos Papadopoulos*