This question is inspired from the long discussion in comments here: How does linear regression use the normal distribution?

In the usual linear regression model, for simplicity here written with only one predictor:

Yi=β0+β1xi+ϵi

where the xi are known constants and ϵi are zero-mean independent error terms. If we in addition assume normal distributions for the errors, then the usual least squares estimators and the maximum likelihood estimators of β0,β1 are identical.So my easy question: do there exist any other distribution for the error terms such that the mle are identical with the ordinary least squaeres estimator? The one implication is easy to show, the other one not so.

**Answer**

In maximum likelihood estimation, we calculate

ˆβML:∑∂lnf(ϵi)∂β=0⟹∑f′(ϵi)f(ϵi)xi=0

the last relation taking into account the linearity structure of the regression equation.

In comparison , the OLS estimator satisfies

∑ϵixi=0

In order to obtain identical algebraic expressions for the slope coefficients we need to have a density for the error term such that

f′(ϵi)f(ϵi)=±cϵi⟹f′(ϵi)=±cϵif(ϵi)

These are differential equations of the form y′=±xy that have solutions

∫1ydy=±∫xdx⟹lny=±12x2

⟹y=f(ϵ)=exp{±12cϵ2}

Any function that has this kernel and integrates to unity over an appropriate domain, will make the MLE and OLS for the slope coefficients identical. Namely we are looking for

g(x)=Aexp{±12cx2}:∫bag(x)dx=1

Is there such a g that is not the normal density (or the half-normal or the derivative of the error function)?

Certainly. But one more thing one has to consider is the following: if one uses the plus sign in the exponent, and a symmetric support around zero for example, one will get a density that has a unique *minimum* in the middle, and two local maxima at the boundaries of the support.

**Attribution***Source : Link , Question Author : kjetil b halvorsen , Answer Author : Alecos Papadopoulos*