For what models does the bias of MLE fall faster than the variance?

Let ˆθ be a maximum likelihood estimate of a true parameter θ of some model. As the number of data points n increases, the error typically decreases as O(1/\sqrt n). Using the triangle inequality and properties of the expectation, it’s possible to show that this error rate implies that both the “bias” \lVert \mathbb E\hat\theta – \theta^*\rVert and “deviation” \lVert \mathbb E\hat\theta – \hat\theta\rVert decrease at the same O(1/\sqrt{n}) rate. Of course, it is possible for models to have bias that shrinks at a faster rate. Many models (like oridinary least squares regression) have no bias.

I’m interested in models that have bias that shrinks faster than O(1/\sqrt n), but where the error does not shrink at this faster rate because the deviation still shrinks as O(1/\sqrt n). In particular, I’d like to know sufficient conditions for a model’s bias to shrink at the rate O(1/n).

Answer

In general, you need models where the MLE is not asymptotically normal but converges to some other distribution (and it does so at a faster rate). This usually happens when the parameter under estimation is at the boundary of the parameter space. Intuitively, this means that the MLE will approach the parameter “only from the one side”, so it “improves on convergence speed” since it is not “distracted” by going “back and forth” around the parameter.

A standard example, is the MLE for \theta in an i.i.d. sample of U(0,\theta) uniform r.v.’s The MLE here is the maximum order statistic,

\hat \theta_n = u_{(n)}

Its finite sample distribution is

F_{\hat \theta_n} = \frac {(\hat \theta_n)^n}{\theta ^n},\;\;\; f_{\hat \theta}=n\frac {(\hat \theta_n)^{n-1}}{\theta ^n}

\mathbb E(\hat \theta_n) = \frac {n}{n+1}\theta \implies B(\hat \theta) = -\frac {1}{n+1}\theta

So B(\hat \theta_n) = O(1/n). But the same increased rate will hold also for the variance.

One can also verify that to obtain a limiting distribution, we need to look at the variable n(\theta – \hat \theta_n),(i.e we need to scale by n) since

P[n(\theta – \hat \theta_n)\leq z] = 1-P[\hat \theta_n\leq \theta – (z/n)]

=1-\frac 1 {\theta^n}\cdot \left(\theta + \frac{-z}{n}\right)^n = 1-\frac {\theta^n} {\theta^n}\cdot \left(1 + \frac{-z/\theta}{n}\right)^n

\to 1- e^{-z/\theta}

which is the CDF of the Exponential distribution.

I hope this provides some direction.

Attribution
Source : Link , Question Author : Mike Izbicki , Answer Author : Alecos Papadopoulos

Leave a Comment