I was asked this question the other day and had never considered it before.

My intuition comes from the advantages of each estimator. Maximum likelihood is preferably when we are confident in the data generating process because, unlike the method of moments, it makes use of the knowledge of the entire distribution. Since MoM estimators only use information contained in the moments, it seems like the two methods should produce the same estimates when the sufficient statistics for the parameter we are attempting to estimate are exactly the moments of the data.

I’ve checked this result with a few distributions. Normal (unknown mean and variance), exponential, and Poisson all have sufficient statistics equal to their moments and have MLEs and MoM estimators the same (not strictly true for things like Poisson where there are multiple MoM estimators). If we look at a Uniform(0,\theta), the sufficient statistic for \theta is \max(X_1,\cdots,X_N) and the MoM and MLE estimators are different.

I thought perhaps this was a quirk of the exponential family, but for a Laplace with known mean the sufficient statistic is \frac{1}{n} \sum |X_i| and the MLE and the MoM estimator for the variance are not equal.

I’ve so far been unable to show any sort of result in general. Does anybody know of general conditions? Or even a counter example would help me refine my intuition.

**Answer**

A general answer is that an estimator based on a method of moments is not invariant by a bijective change of parameterisation, while a maximum likelihood estimator is invariant. Therefore, they almost never coincide. (Almost never across all possible transforms.)

Furthermore, as stated in the question, there are many MoM estimators. An infinity of them, actually. But they are all based on the empirical distribution, \hat{F}, which may be seen as a non-parametric MLE of F, although this does not relate to the question.

Actually, a more appropriate way to frame the question would be to ask when a moment estimator is sufficient, but this forces the distribution of the data to be from an exponential family, by the Pitman-Koopman lemma, a case when the answer is already known.

Note:In the Laplace distribution, when the mean is known, the problem is equivalent to observing the absolute values, which are then exponential variates and part of an exponential family.

**Attribution***Source : Link , Question Author : Upside , Answer Author : Xi’an*