# Consistency of M-estimator based on plug-in estimator?

Suppose we estimate a quantity $\theta_0$ by the $\tilde{\theta} = \hat{\theta}(\eta)$ that solves the estimating equation

$$S_n(\tilde{\theta}, \eta_0) = 0$$

where $\eta_0$ is a nuisance parameter that is known. Suppose that the assumptions of the M-estimator are satisfied, and

$$\tilde{\theta} \xrightarrow{p}\ \theta_0$$

so that consistency is achieved.

Question: Let suppose we do not know $\eta_0$, but we have a consistent estimator $\hat{\eta}$ of $\eta_0$. If now $\hat{\theta} = \hat{\theta}(\hat{\eta})$, under which condition do we have consistency?

Clearly if we estimate $\hat{\eta}$ through an estimating equation we can stack all our estimating equations and thus obtain consistency automatically.

However, what if $\hat{\eta}$ is not obtained through an estimating equation?

Background: For consistency when $\eta_0$ is known, we typically need a function $S(\theta, \eta)$ such that for every $\epsilon > 0$ we have

$$\sup_{\theta \in \Theta} \frac{| S_n(\theta,\eta_0) – S(\theta,\eta_0) |}{1 + | S_n(\theta,\eta_0)| + |S(\theta,\eta_0) |} \xrightarrow{p}\ 0$$
$$\inf_{|\theta – \theta_0| > \delta}| S(\theta,\eta_0) | > 0 = |S(\theta_0, \eta_0)|$$

with $S_n(\tilde{\theta},\eta_0) = op(1)$.

Note that a more restrictive version of the first assumption is

$$\sup_{\theta \in \Theta} | S_n(\theta,\eta_0) – S(\theta,\eta_0) | \xrightarrow{p}\ 0$$

From the infimum condition, for any $\delta >0$ we have an $\epsilon > 0$ such that

$$P\left( \left| \tilde{\theta} – \theta_0 \right| > \delta \right) \le P\left( \left| S(\tilde{\theta},\eta_0) \right| \ge \epsilon \right)$$

Consistency can then be proved through

\begin{align}| S(\tilde{\theta},\eta_0) | &\le | S_n(\tilde{\theta}, \eta_0) | + |S(\tilde{\theta},\eta_0) – S_n(\tilde{\theta}, \eta_0) | \\ &\le op(1) + op(1+|S_n(\tilde{\theta},\eta_0)| + |S(\tilde{\theta}, \eta_0)|) \\ &= op(1 + S(\tilde{\theta}, \eta_0)) = op(1) \end{align}

Hence $P\left( | S(\tilde{\theta},\eta_0) | \ge \epsilon \right) \to 0$ which proves consistency.

Solution:

Suppose that in addition to the previous assumptions, either

(1) $S_n(\theta,\eta)$ is stochastically continuous uniformly in $\theta$ with respect to $\eta$ at $\eta_0$

or

(2) $S(\theta,\eta)$ is continuous uniformly in $\theta$ with respect to $\eta$ at $\eta_0$

with $S_n(\hat{\theta},\hat{\eta}) = op(1)$.

If (1) is true the proof is trivial, with

\begin{align} |S_n(\hat{\theta},\hat{\eta})| &\le |S_n(\hat{\theta},\eta_0)| + |S_n(\hat{\theta},\hat{\eta}) – S_n(\hat{\theta},\eta_0)| \\ &\le |S_n(\hat{\theta},\eta_0)| + \sup_{\theta \in \Theta}|S_n(\theta,\hat{\eta}) – S_n(\theta,\eta_0)| \\ &= |S_n(\hat{\theta},\eta_0)| + op(1) \end{align}
with the last line true because of (1).

We conclude that the $\hat{\theta}$ also satisfies $S_n(\hat{\theta},\eta_0) = op(1)$, and the theory in the background can be applied automatically.

If (2) is true, from the infimum condition, we get that for any $\delta >0$ we have an $\epsilon_1 > 0$ and $\epsilon_2 > 0$ such that

$$\inf_{\theta :|\theta-\theta_0| > \delta}\inf_{|\eta -\eta_0| \le \epsilon_2 }| S(\theta,\eta) | > \epsilon_1$$

Therefore, we have

$$P\left( \left| \hat{\theta} – \theta_0 \right| > \delta \right) \le P\left( \left| S(\hat{\theta},\hat{\eta}) – S(\theta_0,\hat{\eta}) \right| > \epsilon_1 \right) + P(|\hat{\eta} – \eta_0| > \epsilon_2)$$

The last term goes to zero as $n \to \infty$.

Then, we have

\begin{align} | S(\hat{\theta},\hat{\eta}) – S(\theta_0,\hat{\eta}) | &\le |S(\hat{\theta},\eta_0) – S(\theta_0,\eta_0)| \\ &+ |S(\hat{\theta},\hat{\eta}) – S(\hat{\theta},\eta_0)| + |S( \theta_0,\hat{\eta}) – S(\theta_0,\eta_0)| \\ &\le op(1) + 2\sup_{\theta \in \Theta}|S( \theta,\hat{\eta}) – S(\theta,\eta_0)| \\ &= op(1) \end{align}

where the last line is true because of (2).