A sequence of estimators Un for a parameter θ is asymptotically normal if √n(Un−θ)→N(0,v). (source) We then call v the asymptotic variance of Un. If this variance is equal to the Cramer-Rao bound, we say the estimator/sequence is asymptotically efficient.
Question: Why do we use √n in particular?
I know that for the sample mean, Var(ˉX)=σ2n and so this choice normalizes it. But since the definitions above apply to more than the sample mean, why do we still choose to normalize by √n.
We don’t get to choose here. The “normalizing” factor, in essence is a “variance-stabilizing to something finite” factor, so as for the expression not to go to zero or to infinity as sample size goes to infinity, but to maintain a distribution at the limit.
So it has to be whatever it has to be in each case. Of course it is interesting that in many cases it emerges that it has to be √n. (but see also @whuber’s comment below).
A standard example where the normalizing factor has to be n, rather than √n is when we have a model
with ut white noise, and we estimate the unknown β by Ordinary Least Squares.
If it so happens that the true value of the coefficient is |β|<1, then the the OLS estimator is consistent and converges at the usual √n rate.
But if instead the true value is β=1 (i.e we have in reality a pure random walk), then the OLS estimator is consistent but will converge "faster", at rate n (this is sometimes called a "superconsistent" estimator -since, I guess, so many estimators converge at rate √n).
In this case, to obtain its (non-normal) asymptotic distribution, we have to scale (ˆβ−β) by n (if we scale only by √n the expression will go to zero). Hamilton ch 17 has the details.