Linear regression with l0l_0 regularization

In a linear regression problem with sparsity constraint, P=(P1,,PN)T is the column vector of the outputs, and D=(dj,k) is the (N×M)– dimensional matrix of inputs. The objective function is

argmincRM(PDc22+λc0)

in which c0=#{j:cj0}

I learnt that this problem is NP-hard, but I don’t understand why.

What I think:

There are in total N cases that we need to consider

N=C(M,1)+C(M,2)++C(M,M)
in which M is the dimension of the coefficient vector c. C(M,n)=(Mn)=M!(Mn)!n!.

For each tuple of selected features, we perform OLS (without taking into account the regularizer) and record the loss:
L=RMSE+λc0

After doing N such calculations, we can choose the tuple of features that yield the smallest L.

However, I don’t know whether the result that obtained really is the solution to the original objective function since we don’t take into account the effect of regularizer in the first place.

Or this algorithm does not make sense, instead of comparing L=RMSE+λc0 between tuples of different sizes, we can only compare the RMSE of the set of feature tuples of the same size.

Answer

First, note that NP-hardness is a property of a problem, rather than a specific algorithm–it puts bounds on the performance of any algorithm. The proof works by establishing relationships between problems to prove membership in complexity classes.

Background

A complexity class is a set of computational problems defined by 1) a computational resource of interest (e.g. time, memory, etc.), 2) a particular model of computation (e.g. a Turing machine), and 3) a bound describing how the resources needed to compute a solution scale with the problem size.

NP is the set of all decision problems where, if the answer is “yes”, there exists a proof that can be verified in polynomial time by a deterministic Turing machine. Informally, this means it’s possible to efficiently check the answer, even if it’s not possible to efficiently produce the answer in the first place.

A problem q is NP-complete if it’s in NP, and every problem r in NP can be reduced to q in polynomial time. This means that there exists an efficient algorithm for transforming r into q. So, if an efficient algorithm for solving q exists, it can also be used to solve r efficiently. Informally, if a can be reduced to b, then b is at least as difficult to solve as a. So, NP-complete problems are the most difficult problems in NP.

A problem q is NP-hard if every NP-complete problem can be reduced to it in polynomial time. However, q need not necessarily be in NP. Informally, NP-hard problems are at least as difficult as the most difficult problems in NP.

Sparse regression

The NP-hardness of sparse regression is proved in this paper:

Natarajan (1995). Sparse approximate solutions to linear systems.

They call the sparse regression problem SAS (for sparse approximate solution). The proof of NP-hardness works by reducing a problem called “exact cover with 3-sets” (a.k.a. X3C) to SAS. They explicitly show how to transform any X3C problem into an SAS problem. X3C is known to be NP-complete, which implies that SAS is NP-hard.

Attribution
Source : Link , Question Author : meTchaikovsky , Answer Author : user20160

Leave a Comment