I’m trying to predict a response variable in linear regression that should be always positive (cost per click). It’s a monetary amount. In adwords, you pay google for clicks on your ads, and a negative number would mean that google pays you when people clicked 😛

The predictors are all continuous values. The Rsquared and RMSE are decent when compared to other models, even out-of-sample:

`RMSE Rsquared 1.4141477 0.8207303`

I cannot rescale the predictions, because it’s money, so even a small rescaling factor could change costs significantly.

As far as I understand, for the regression model there’s nothing special about zero and negative numbers, so it finds the best regression hyperplane no matter whether the output is partly negative.

This is a very first attempt, using all variables I have. So there’s room for refinement.

Is there any way to tell the model that the output cannot be negative?

**Answer**

I assume that you are using the OLS estimator on this linear regression model. You can use the * inequality constrained least-squares estimator*, which will be the solution to a minimization problem under inequality constraints. Using standard matrix notation (vectors are column vectors) the minimization problem is stated as

min

…where \mathbf y is n \times 1 , \mathbf X is n\times k, \beta is k\times 1 and \mathbf Z is the m \times k matrix containing the out-of-sample regressor series of length m that are used for prediction. We have m linear inequality constraints (and the objective function is convex, so the first order conditions are sufficient for a minimum).

The Lagrangean of this problem is

L = (\mathbf y-\mathbf X\beta)'(\mathbf y-\mathbf X\beta) -\lambda’\mathbf Z\beta = \mathbf y’\mathbf y-\mathbf y’\mathbf X\beta – \beta’\mathbf X’\mathbf y+ \beta’\mathbf X’\mathbf X\beta-\lambda’\mathbf Z\beta

= \mathbf y’\mathbf y – 2\beta’\mathbf X’\mathbf y+ \beta’\mathbf X’\mathbf X\beta-\lambda’\mathbf Z\beta

where \lambda is a m \times 1 column vector of non-negative Karush -Kuhn -Tucker multipliers. The first order conditions are (you may want to review rules for matrix and vector differentiation)

\frac {\partial L}{\partial \beta}= \mathbb 0\Rightarrow – 2\mathbf X’\mathbf y +2\mathbf X’\mathbf X\beta – \mathbf Z’\lambda

\Rightarrow \hat \beta_R = \left(\mathbf X’\mathbf X\right)^{-1}\mathbf X’\mathbf y + \frac 12\left(\mathbf X’\mathbf X\right)^{-1}\mathbf Z’\lambda = \hat \beta_{OLS}+ \left(\mathbf X’\mathbf X\right)^{-1}\mathbf Z’\xi \qquad [1]

…where \xi = \frac 12 \lambda, for convenience, and \hat \beta_{OLS} is the estimator we would obtain from ordinary least squares estimation.

The method is fully elaborated in Liew (1976).

**Attribution***Source : Link , Question Author : usillos , Answer Author : Alecos Papadopoulos*