What regression/estimation is not a MLE?

I just rigorously learned that OLS is a special case of MLE. It surprises me because the popular and “reliable” sources such as researchgate and this do not mention this most important connection between MLE and OLS!

I am not sure if there are any simple regression or estimation method that does not belong to MLE.

Least squares is indeed maximum likelihood if the errors are iid normal, but if they aren’t iid normal, least squares is not maximum likelihood. For example if my errors were logistic, least squares wouldn’t be a terrible idea but it wouldn’t be maximum likelihood.

Lots of estimators are not maximum likelihood estimators; while maximum likelihood estimators typically have a number of useful and attractive properties they’re not the only game in town (and indeed not even always a great idea).

A few examples of other estimation methods would include

• method of moments (this involves equating enough sample and population moments to solve for parameter estimates; sometimes this turns out to be maximum likelihood but usually it doesn’t)

For example, equating first and second moments to estimate the parameters of a gamma distribution or a uniform distribution; not maximum likelihood in either case.

• method of quantiles (equating sufficient sample and population quantiles to solve for parameter estimates; occasionally this is maximum likelihood but usually it isn’t),

• minimizing some other measure of lack of fit than $$-\log\mathcal{L}-\log\mathcal{L}$$ (e.g. minimum chi-square, minimum K-S distance).

With fitting linear regression type models, you could for example look at robust regression (some of which do correspond to ML methods for some particular error distribution but many of which do not).

In the case of simple linear regression, I show an example of two methods of fitting lines that are not maximum likelihood here – there estimating slope by setting to 0 some other measure of correlation (i.e. other than the usual Pearson) between residuals and the predictor.

Another example would be the Tukey’s resistant line/Tukey’s three group line (e.g. see ?line in R). There are many other possibilities, though many of them don’t generalize readily to the multiple regression situation.