# Does the definition of regular estimator depend on the rate of convergence? If not, should it?

The definition of regular estimator in my lecture notes is:

Let $X_1^{(n)}, \dots, X_n^{(n)} \overset{iid}{\sim} P_n \sim \mathcal{P}(\Theta)$ where $\mathcal{P}(\Theta)$ is a regular parametric model with $\Theta \subseteq \mathbb{R}^d$ and let $\beta(\theta): \Theta \to \mathbb{R}^k$ be a continuously differentiable function of $\theta$. Then an estimator of $\hat{\beta}_n = \hat{\beta}_n(X_1^{(n)}, \dots, X_n^{(n)})$ is called a regular estimator of $\beta(\theta)$ at $\theta= \theta^*$ if for every $h \in \mathbb{R}^d$: $$\sqrt{n}\left(\hat{\beta}_n – \beta(\theta^* + h/\sqrt{n}) \right) \underset{P_{\theta^* + h/\sqrt{n}}}{\overset{d}{\longrightarrow}} G_{\theta^*} \,,$$ for some distribution $G_{\theta^*}$ depending only on $\theta^*$ and not on $h$.

(The definition of regular parametric model is included below for completeness.)

This definition seems to be implicitly assuming that the rate at which $\hat{\beta}_n$ converges to $\beta({\theta^*})$ is $O_P(n^{-1/2})$, since the “localization around $\theta^*$” is given in terms of that scaling.

Question: Would it not make more sense for the definition of regular estimator, in order for it to be generally applicable, to take into account the rate of convergence of the estimator?

E.g. if an estimator converges at $O_P(n^{-1/3})$, as occurs for the mode, then shouldn’t the invariance of its asymptotic distribution be considered under localizations of scaling of $n^{-1/3}$?

(“Localization” of scaling $f(n)$ is what I call replacing $\beta(\theta^*)$ by $\beta(\theta^* + h \cdot f(n))$.)

Yes, Hodges’ estimator converges faster than rate $n^{-1/2}$ at some points, although not at “most”, so its rate wouldn’t be $n^{-1/2}$ unless you used the “minimax” or “worst case rate”. So for the sake of simplicity just replace “rate” everywhere above with “minimax/worst case rate” — then Hodges’s estimator still would not be regular with scaling $n^{-1/2}$, using the proposed more general definition.

• A parametric model $\mathcal{P}(\Theta)$ with $\Theta \subseteq \mathbb{R}^d$ will be called regular if the following holds:
1. $\Theta$ is an open subset of $\mathbb{R}^d$.
2. The map $\theta \mapsto p(x, \theta)$ is continuously differentiable for all $x$.
3. The entries of $I(\theta)$ (i.e. $\operatorname{Var}_{P_{\theta}}\frac{\partial}{\partial \theta} \log p(X, \theta)$ exist and are continuous functions of $\theta$.
4. The information matrix $I(\theta)$ is non-singular.

This paper by Van der Vaart (in section 27.3ff) looks at regularity with scaling rates other than $$\sqrt{n}$$. He argues that the point of regularity is basically to show that you can’t get superefficiency. This viewpoint means you want the offsets $$h/\sqrt{n}$$ to be the ones that give contiguous sequences of distributions; the ones that are distinguishable from $$h=0$$, but not with power going to 1. So the $$\sqrt{n}$$ is the consistency rate of the efficient estimator. There’s an example on p406 where the rate is $$n$$ rather than $$\sqrt{n}$$.
For smooth parametric models you get $$\sqrt{n}$$, but for models where the consistency rate is something else you get something else.