# Why do we call the equations of least square estimation in linear regression the *normal equations*?

When we want to estimate parameters of linear regression, we make normal equations as many as the linear model contain number of unknowns. Why are these equation called normal equations?

I’ll give what is perhaps the most common understanding, then some additional details.

Normal is a term in geometry (Wikipedia):

In geometry, a normal is an object such as a line or vector that is perpendicular to a given object.

which in turn appears to come from a term for a carpenter’s or mason’s square [1]

NORM and NORMAL. According to the OED, in Latin norma could mean a square used by carpenters, masons, etc., for obtaining right angles, a right angle or a standard or pattern of practice or behaviour. These meanings are reflected in the mathematical terms based on norm and normal.

and from geometry the term moves into vector spaces.

The direct answer for “normal equations” is given here: http://mathworld.wolfram.com/NormalEquation.html

It is called a normal equation because $b-Ax$ is normal to the range of $A$.

(In the usual regression notation that’s ‘$y-Xb$ is normal to the range of $X$‘)

Literally, the least squares residual is perpendicular (at right angles) to the space spanned by $X$.

The $y$-vector lies in $n$ dimensions. The X-matrix spans $p$ of those (or $p+1$ depending on how your notation is set up; if $X$ is of full rank, it’s the number of columns of X). The least squares solution $X\hat{\beta}$ is the nearest point in that space spanned by $X$ to that $y$-vector (indeed, literally the projection of $y$ onto the space spanned by $X$). It is necessarily the case that by minimizing the sum of squares, the difference $y-X\hat{\beta}$ is orthogonal to the space spanned by $X$. (If it were not, there would be a still smaller solution.)

However, as whuber suggests in comments, it’s not quite so clear cut.

Looking at [1] again:

The term NORMAL EQUATION in least squares was introduced by Gauss in 1822 [James A. Landau]. Kruskal & Stigler‘s “Normative Terminology” (in Stigler (1999)) consider various hypotheses about where the term came from but do not find any very satisfactory.

However, the method of normal equations is often credited to Legendre, 1805.

[1] Miller, J. (ed)
“Earliest known uses of some of the words of mathematics, N”
in Earliest known uses of some of the words of mathematics