Why should one use EM vs. say, Gradient Descent with MLE?

Mathematically, it’s often seen that expressions and algorithms for Expectation Maximization (EM) are often simpler for mixed models, yet it seems that almost everything (if not everything) that can be solved with EM can also be solved with MLE (by, say, the Newton-Raphson method, for expressions that are not closed).

In literature, though, it seems that many favour EM over other methods (including minimization of the LL by, say, gradient descent); is it because of its simplicity in these models? Or is it for other reasons?

Answer

I think there’s some crossed wires here. The MLE, as referred to in the statistical literature, is the Maximum Likelihood Estimate. This is an estimator. The EM algorithm is, as the name implies, an algorithm which is often used to compute the MLE. These are apples and oranges.

When the MLE is not in closed form, a commonly used algorithm for finding this is the Newton-Raphson algorithm, which may be what you are referring to when you state “can also be solved with MLE”. In many problems, this algorithm works great; for “vanilla” problems, it’s typically hard to beat.

However, there are plenty of problems where it fails, such as mixture models. My experience with various computational problems has been that while the EM algorithm is not always the fastest choice, it’s often the easiest for a variety of reasons. Many times with novel models, the first algorithm used to find the MLE will be an EM algorithm. Then, several years later, researchers may find that a significantly more complicated algorithm is significantly faster. But these algorithms are non-trival.

Additionally, I speculate that much of the popularity of the EM-algorithm is the statistical flavor of it, helping statisticians feel differentiated from numerical analysts.

Attribution
Source : Link , Question Author : Guillermo Angeris , Answer Author : Cliff AB

Leave a Comment