Let’s say that we have X of shape (2, 5)

and y of shape (2,)This works:

`np.linalg.lstsq(X, y)`

We would expect this to work only if X was of shape (N,5) where N>=5

But why and how?We do get back 5 weights as expected but how is this problem solved?

Isn’t it like we have 2 equations and 5 unknowns?

How could numpy solve this?

It must do something like interpolation to create more artificial equations?..

**Answer**

My understanding is that numpy.linalg.lstsq relies on the LAPACK routine dgelsd.

The problem is to solve:

minimize(overx)‖

Of course, this does not have a *unique* solution for a matrix A whose rank is less than length of vector \mathbf{b}. In the case of an undetermined system, `dgelsd`

provides a solution \mathbf{z} such that:

- A\mathbf{z} = \mathbf{b}
- \| \mathbf{z} \|_2 \leq \|\mathbf{x} \|_2 for all \mathbf{x} that satisfy A\mathbf{x} = \mathbf{b}. (i.e. \mathbf{z} is the minimum norm solution to the undetermined system.

Example, if system is x + y = 1, numpy.linalg.lstsq would return x = .5, y = .5.

### How does dgelsd work?

The routine `dgelsd`

computes the singular value decomposition (SVD) of A.

I’ll just sketch the idea behind using an SVD to solve a linear system. The singular value decomposition is a factorization U \Sigma V’ = A where U and V are orthogonal matrices and \Sigma is a diagonal matrix where the diagonal entries are known as singular values.

The effective rank of matrix A will be the number of singular values that are effectively non-zero (i.e. sufficiently different from zero relative to machine precision etc…). Let S be a diagonal matrix of the non-zero singular values. The SVD is thus:

A = U \begin{bmatrix} S & 0 \\ 0 & 0 \end{bmatrix} V’

The pseudo-inverse of A is given by:

A^\dagger = V \begin{bmatrix} S^{-1} & 0 \\ 0 & 0 \end{bmatrix} U’

Consider the solution \mathbf{x} = A^\dagger \mathbf{b}. Then:

\begin{align*}

A\mathbf{x} – \mathbf{b} &= U \begin{bmatrix} S & 0 \\ 0 & 0 \end{bmatrix} V’ V \begin{bmatrix} S^{-1} & 0 \\ 0 & 0 \end{bmatrix} U’ \mathbf{b} – \mathbf{b} \\ &= U \begin{bmatrix} I & 0 \\ 0 & 0 \end{bmatrix} U’ \mathbf{b} – \mathbf{b}\\

\end{align*}

There basically two cases here:

- The number of non-zero singular values (i.e. the size of matrix I) is less than the length of \mathbf{b}. The solution here won’t be exact; we’ll solve the linear system in the least squares sense.
- A\mathbf{x} – \mathbf{b} = \mathbf{0}

This last part is a bit tricky… need to keep track of matrix dimensions and use that U is an orthogonal matrix.

### Equivalence of pseudo-inverse

When A has linearly independent rows, (eg. we have a fat matrix), then:

A^\dagger = A’\left(AA’ \right)^{-1}

For an undetermined system, you can show that the pseudo-inverse gives you the minimum norm solution.

When A has linearly independent columns, (eg. we have a skinny matrix), then:

A^\dagger = \left(A’A \right)^{-1}A’

**Attribution***Source : Link , Question Author : George Pligoropoulos , Answer Author : Matthew Gunn*