The issue has come up before, but I want to ask a specific question that will attempt to elicit an answer that will clarify (and classify) it:
In “Poor Man’s Asymptotics”, one keeps a clear distinction between
- (a) a sequence of random variables that converges in probability to a constant
as contrasted to
- (b) a sequence of random variables that converges in probability to a random variable (and hence in distribution to it).
But in “Wise Man’s Asymptotics”, we can also have the case of
- (c) a sequence of random variables that converges in probability to a constant while maintaining a non-zero variance at the limit.
My question is (stealing from my own exploratory answer below):
How can we understand an estimator that is asymptotically consistent but also has a non-zero, finite variance? What does this variance reflects? How its behavior differs from a “usual” consistent estimator?
Threads related to the phenomenon described in (c) (look also in the comments):
I won’t give a very satisfactory answer to your question because it seems to me to be a little bit too open, but let me try to shed some light on why this question is a hard one.
I think you are struggling with the fact that the conventional topologies we use on probability distributions and random variables are bad. I’ve written a bigger piece about this on my blog but let me try to summarize: you can converge in the weak (and the total-variation) sense while violating commonsensical assumptions about what convergence means.
For example, you can converge in weak topology towards a constant while having variance = 1 (which is exactly what your Z_n sequence is doing). There is then a limit distribution (in the weak topology) that is this monstruous random variable which is most of the time equal to 0 but infinitesimally rarely equal to infinity.
I personally take this to mean that the weak topology (and the total-variation topology too) is a poor notion of convergence that should be discarded. Most of the convergences we actually use are stronger than that. However, I don’t really know what should we use instead of the weak topology sooo …
If you really want to find an essential difference between \hat \theta= \bar X+Z_n and \tilde \theta=\bar X, here is my take: both estimators are equivalent for the [0,1]-loss (when the size of your mistake doesn’t matter). However, \tilde \theta is much better if the size of your mistakes matter, because \hat \theta sometimes fails catastrophically.