In a linear regression problem with sparsity constraint, P=(P1,⋯,PN)T is the column vector of the outputs, and D=(dj,k) is the (N×M)– dimensional matrix of inputs. The objective function is

argminc∈RM(‖P−Dc‖22+λ‖c‖0)

in which ‖c‖0=#{j:cj≠0}

I learnt that this problem is NP-hard, but I don’t understand why.

What I think:There are in total N cases that we need to consider

N=C(M,1)+C(M,2)+⋯+C(M,M)

in which M is the dimension of the coefficient vector c. C(M,n)=(Mn)=M!(M−n)!n!.For each tuple of selected features, we perform OLS (without taking into account the regularizer) and record the loss:

L=RMSE+λ‖c‖0After doing N such calculations, we can choose the tuple of features that yield the smallest L.

However, I don’t know whether the result that obtained really is the solution to the original objective function since we don’t take into account the effect of regularizer in the first place.

Or this algorithm does not make sense, instead of comparing L=RMSE+λ‖c‖0 between tuples of different sizes, we can only compare the RMSE of the set of feature tuples of the same size.

**Answer**

First, note that NP-hardness is a property of a *problem*, rather than a specific *algorithm*–it puts bounds on the performance of *any* algorithm. The proof works by establishing relationships between problems to prove membership in complexity classes.

**Background**

A complexity class is a set of computational problems defined by 1) a computational resource of interest (e.g. time, memory, etc.), 2) a particular model of computation (e.g. a Turing machine), and 3) a bound describing how the resources needed to compute a solution scale with the problem size.

NP is the set of all decision problems where, if the answer is “yes”, there exists a proof that can be verified in polynomial time by a deterministic Turing machine. Informally, this means it’s possible to efficiently check the answer, even if it’s not possible to efficiently produce the answer in the first place.

A problem q is NP-complete if it’s in NP, and every problem r in NP can be reduced to q in polynomial time. This means that there exists an efficient algorithm for transforming r into q. So, if an efficient algorithm for solving q exists, it can also be used to solve r efficiently. Informally, if a can be reduced to b, then b is at least as difficult to solve as a. So, NP-complete problems are the most difficult problems in NP.

A problem q is NP-hard if every NP-complete problem can be reduced to it in polynomial time. However, q need not necessarily be in NP. Informally, NP-hard problems are at least as difficult as the most difficult problems in NP.

**Sparse regression**

The NP-hardness of sparse regression is proved in this paper:

Natarajan (1995). Sparse approximate solutions to linear systems.

They call the sparse regression problem SAS (for sparse approximate solution). The proof of NP-hardness works by reducing a problem called “exact cover with 3-sets” (a.k.a. X3C) to SAS. They explicitly show how to transform any X3C problem into an SAS problem. X3C is known to be NP-complete, which implies that SAS is NP-hard.

**Attribution***Source : Link , Question Author : meTchaikovsky , Answer Author : user20160*