The problem I have is in figuring out why the MLE is no longer consistent in countable parameter spaces under conditions specified below.

The set up is as follows: we are consider a parameters space Θ⊆R and a set of probability distributions, P={Pθ:θ∈Θ}.

We assume the following conditions hold:

- The distributions Pθ of the observations are distinct
- The distributions Pθ have common support.
- The observations are Xn={X1,...,Xn}, where Xi are iid with probability density f(xi|θ) with respect to a sigma-finite measure μ.
I take as given that the following theorem holds (see, for example, Theorem 6.1.1 in Hogg et al.,

Introduction to Mathematical Statistics(7th ed.)). Also, throughout, we denote by θ0 the “true” parameter that generated the data.

Result 1.Under conditions 1. to 3. the likelihood function

L(θ|Xn)=n∏i=1f(Xi|θ)

satisfies

Pθ0[L(θ0|Xn)>L(θ|Xn)]→1 as n→∞

for any fixed θ≠θ0.

I want to show that under the same conditions, plus the assumption that the parameter space is finite, the estimator that maximizes the likelihood function is consistent. Further, I want to understand why this result breaks down when the parameter space is at least countably infinite.

So, here’s my attempted proof for the finite case.

As Θ is finite, we may write Θ={θ0,θ1,...,θm}, where θ0 is the true parameter. Let

ˆθ(Xn)=argmax

be the MLE. We want to show that there is a unique value \hat\theta(\mathbf X_n) that maximizes the likelihood and that it tends to \theta_0 in probability as n \rightarrow\infty.For each 1\le j\le m, let A_{jn} =\{\mathbf X_n : L(\theta_0| \mathbf X_n)>L(\theta_j | \mathbf X_n)\}. Then, the result above shows that P_{\theta_0}[A_{jn}] \rightarrow 1 as n\rightarrow\infty for all j. What remains to be shown is that P_{\theta_0}[A_{1n}\cap A_{2n}\cap \cdots \cap A_{mn}] \rightarrow 1 as n\rightarrow\infty as well. It will suffice to show that this result holds for any A_{jn}\cap A_{j’n}, j\ne j’ (as we can repeatedly apply this result m-1 times). We have

P_{\theta_0}[A_{jn}\cap A_{j’n}] = 1- P_{\theta_0}[A_{jn}^C\cup A_{j’n}^C] \ge 1 – P_{\theta_0}[A_{jn}^C] – P_{\theta_0}[A_{j’n}^C]

for all n by the sub-additivity of probability measure. Thus,

P_{\theta_0}[A_{jn}\cap A_{j’n}] = 1

as n\rightarrow\infty, since P_{\theta_0}[A_{jn}^C] and P_{\theta_0}[A_{j’n}^C] converge to zero. Hence,

P_{\theta_0}[A_{1n}\cap A_{2n}\cap \cdots \cap A_{mn}] \rightarrow 1.

In other words, the probability that the likelihood function evaluated at \theta_0 is simultaneously (strictly) larger than any other possible parameter value converges to one. By condition 1., however, this value must be unique with probability converging to one. Thus, by choosing the value of \theta that maximizes the likelihood function—i.e., \hat\theta(\mathbf X_n)—we will choose the true parameter value \theta_0 with probability converging to one. This is equivalent to the statement that

P_{\theta_0}[\hat\theta(\mathbf X_n) = \theta_0] \rightarrow 1

as n\rightarrow\infty. Thus, \hat\theta(\mathbf X_n) is consistent.

Now, I wonder why this result breaks down when \Theta is assumed to be (countably) infinite. My attempt so far is the following:

Suppose that \Theta is countably infinite. Then,

\begin{aligned}

\lim_{n\rightarrow\infty} P_{\theta_0}\left[\bigcap_{j=1}^\infty A_{jn}\right] &= 1 – \lim_{n\rightarrow\infty} P_{\theta_0}\left[\bigcup_{j=1}^\infty A_{jn}^C \right] = 1 – \lim_{n\rightarrow\infty} P_{\theta_0}\left[\lim_{m\rightarrow\infty} \bigcup_{j=1}^m A_{jn}^C \right] \\

&= 1 – \lim_{n\rightarrow\infty} \lim_{m\rightarrow\infty} P_{\theta_0}\left[ \bigcup_{j=1}^m A_{jn}^C \right]

\end{aligned}

where the last step follows from B_{mn} = \bigcup_{j=1}^m A_{jn}^C being an increasing sequence in m and the continuity property of probability measure. If we could interchange the two limits, then \lim_{n\rightarrow\infty}P_{\theta_0}\left[\bigcap_{j=1}^\infty A_{jn}\right] =1 as for each fixed m, \lim_{n\rightarrow\infty}P_{\theta_0}\left[\bigcup_{j=1}^m A_{jn}^C\right] = 0 . Yet, for any finite n, we can find \epsilon(n)>0 such that

\lim_{m\rightarrow\infty} P_{\theta_0}[B_{mn}] > \epsilon(n).If it would be possible to find an \epsilon that does not depend on n or if we can show that \inf_n \epsilon(n) > 0, I think this would prove the result. I’m, however, not sure whether we can show this.

Am I am on the right track? Any suggestion would be helpful. Thanks!

**Answer**

You don’t want a proof showing the MLE is *never* consistent with an infinite parameter space, because that’s not true. There are *many* settings with countably infinite parameter spaces that have consistent MLEs. There are even *many* settings with uncountable parameter spaces that have consistent MLEs — the usual N(\mu,\sigma^2) model with real \mu and positive \sigma^2, for example. The issue is that you need some extra conditions when the parameter space is infinite; it’s not automatic.

You’re correct that the question is whether \inf_n \epsilon(n)>0, and that it’s not (in general obvious). In fact, it *might* be zero or might not be, and whether it is will depend on details of the model; there isn’t a general result from set theory.

If you look at proofs of consistency when the parameter space is an interval of the reals (or of \mathbb{R}^d), they typically assume some smoothness for the dependence of the density on \theta, and then have some way to ensure that \hat\theta is eventually in some compact neighourhood of \theta_0. Compactness + smoothness acts as a substitute for finiteness, meaning that you don’t have to consider each A_{jn} separately.

**Attribution***Source : Link , Question Author : baruuum , Answer Author : Thomas Lumley*