# Idea and intuition behind quasi maximum likelihood estimation (QMLE)

Question(s): What is the idea and intuition behind quasi maximum likelihood estimation (QMLE; also known as pseudo maximum likelihood estimation, PMLE)? What makes the estimator work when the actual error distribution does not match the assumed error distribution?

The Wikipedia site for QMLE is fine (brief, intuitive, to the point), but I could use some more intuition and detail, perhaps also an illustration. Other references are most welcome. (I remember going over quite a few econometrics textbooks looking for material on QMLE, and to my surprise, QMLE was only covered in one or two of them, e.g. Wooldridge “Econometric Analysis of Cross Section and Panel Data” (2010), Chapter 13 Section 11, pp. 502-517.)

“What makes the estimator work when the actual error distribution does not match the assumed error distribution?”

In principle the QMPLE does not “work”, in the sense of being a “good” estimator. The theory developed around the QMLE is useful because it has led to misspecification tests.

What the QMLE certainly does is to consistently estimate the parameter vector which minimizes the Kullback-Leiber Divergence between the true distribution and the one specified. This sounds good, but minimizing this distance does not mean that the minimized distance won’t be enormous.

Still, we read that there are many situations that the QMLE is a consistent estimator for the true parameter vector. This has to be assessed case-by-case, but let me give one very general situation, which shows that there is nothing inherent in the QMLE that makes it consistent for the true vector…

… Rather it is the fact that it coincides with another estimator that is always consistent (maintaining the ergodic-stationary sample assumption) : the old-fashioned, Method of Moments estimator.

In other words, when in doubt about the distribution, a strategy to consider is “always specify a distribution for which the Maximum Likelihood estimator for the parameters of interest coincides with the Method of Moments estimator”: in this way no matter how off the mark is your distributional assumption, the estimator will at least be consistent.

You can take this strategy to ridiculous extremes: assume that you have a very large i.i.d. sample from a random variable, where all values are positive. Go on and assume that the random variable is normally distributed and apply maximum likelihood for the mean and variance: your QMLE will be consistent for the true values.

Of course this begs the question, why pretending to apply MLE since what we are essentially doing is relying and hiding behind the strengths of Method of Moments (which also guarantees asymptotic normality)?

In other more refined cases, QMLE may be shown to be consistent for the parameters of interest if we can say that we have specified correctly the conditional mean function but not the distribution (this is for example the case for Pooled Poisson QMLE – see Wooldridge).