How does NumPy solve least squares for underdetermined systems?

Let’s say that we have X of shape (2, 5)
and y of shape (2,)

This works: np.linalg.lstsq(X, y)

We would expect this to work only if X was of shape (N,5) where N>=5
But why and how?

We do get back 5 weights as expected but how is this problem solved?

Isn’t it like we have 2 equations and 5 unknowns?
How could numpy solve this?
It must do something like interpolation to create more artificial equations?..


My understanding is that numpy.linalg.lstsq relies on the LAPACK routine dgelsd.

The problem is to solve:


Of course, this does not have a unique solution for a matrix A whose rank is less than length of vector \mathbf{b}. In the case of an undetermined system, dgelsd provides a solution \mathbf{z} such that:

  • A\mathbf{z} = \mathbf{b}
  • \| \mathbf{z} \|_2 \leq \|\mathbf{x} \|_2 for all \mathbf{x} that satisfy A\mathbf{x} = \mathbf{b}. (i.e. \mathbf{z} is the minimum norm solution to the undetermined system.

Example, if system is x + y = 1, numpy.linalg.lstsq would return x = .5, y = .5.

How does dgelsd work?

The routine dgelsd computes the singular value decomposition (SVD) of A.

I’ll just sketch the idea behind using an SVD to solve a linear system. The singular value decomposition is a factorization U \Sigma V’ = A where U and V are orthogonal matrices and \Sigma is a diagonal matrix where the diagonal entries are known as singular values.

The effective rank of matrix A will be the number of singular values that are effectively non-zero (i.e. sufficiently different from zero relative to machine precision etc…). Let S be a diagonal matrix of the non-zero singular values. The SVD is thus:

A = U \begin{bmatrix} S & 0 \\ 0 & 0 \end{bmatrix} V’

The pseudo-inverse of A is given by:

A^\dagger = V \begin{bmatrix} S^{-1} & 0 \\ 0 & 0 \end{bmatrix} U’

Consider the solution \mathbf{x} = A^\dagger \mathbf{b}. Then:

A\mathbf{x} – \mathbf{b} &= U \begin{bmatrix} S & 0 \\ 0 & 0 \end{bmatrix} V’ V \begin{bmatrix} S^{-1} & 0 \\ 0 & 0 \end{bmatrix} U’ \mathbf{b} – \mathbf{b} \\ &= U \begin{bmatrix} I & 0 \\ 0 & 0 \end{bmatrix} U’ \mathbf{b} – \mathbf{b}\\

There basically two cases here:

  1. The number of non-zero singular values (i.e. the size of matrix I) is less than the length of \mathbf{b}. The solution here won’t be exact; we’ll solve the linear system in the least squares sense.
  2. A\mathbf{x} – \mathbf{b} = \mathbf{0}

This last part is a bit tricky… need to keep track of matrix dimensions and use that U is an orthogonal matrix.

Equivalence of pseudo-inverse

When A has linearly independent rows, (eg. we have a fat matrix), then:
A^\dagger = A’\left(AA’ \right)^{-1}

For an undetermined system, you can show that the pseudo-inverse gives you the minimum norm solution.

When A has linearly independent columns, (eg. we have a skinny matrix), then:
A^\dagger = \left(A’A \right)^{-1}A’

Source : Link , Question Author : George Pligoropoulos , Answer Author : Matthew Gunn

Leave a Comment