# When does maximum likelihood work and when it doesn’t?

I’m confused about the maximum likelihood method as compared to e.g. computing the arithmetic mean.

When and why does maximum likelihood produce “better” estimates than e.g. arithmetic mean? How is this verifiable?

While the arithmetic mean $\bar{x}$ may sound as the “natural” estimator, one could ask why it should be preferred to the MLE! The only sure property associated with the arithmetic mean is that it is an unbiased estimator of $\mathbb{E}[X]$ when this expectation is defined. (Think of the Cauchy distribution as a counter-example.) The later indeed enjoys a wide range of properties under regularity conditions on the likelihood function. To borrow from the wikipedia page, the MLE is

1. consistent
2. asymptotically normal
3. efficient in that it achieves the minimum asymptotic variance
4. invariant under bijective transforms
5. within the parameter set even for constrained parameter sets

In comparison with the arithmetic mean, most of those properties are also satisfied for regular enough distributions. Except 4 and 5. In the case of exponential families, the MLE and the arithmetic mean are identical for estimating the parameter in the mean parameterisation (but not for other parameterisations). And the MLE exists for a sample from the Cauchy distribution.

However, when turning to finite sample optimality properties like minimaxity or admissibility, it may happen that the MLE is neither minimax nor admissible. For instance, the Stein effect shows there exist estimators with a smaller quadratic risk for all values of the parameter under some constraints on the distribution of the sample and the dimension of the parameter. This is the case when $x\sim\mathcal{N}_p(\theta,I_p)$ and $p\ge 3$.