# Why regularize all parameters in the same way?

My question relates to regularization in linear regression and logistic regression. I’m currently doing week 3 of Andrew Ng’s Machine Learning course on Coursera. I understand how overfitting can be a common problem and I have some intuition for how regularization can reduce overfitting. My question is can we improve our models by regularizing different parameters in different ways?

Example:

Let’s say we’re trying to fit $$w0+w1x1+w2x2+w3x3+w4x4w_0 + w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4$$. This question is about why we penalize for high $$w1w_1$$ values in the same way that penalize for high $$w2w_2$$ values.

If we know nothing about how our features $$(x1,x2,x3,x4)(x_1,x_2,x_3,x_4)$$ were constructed, it makes sense to treat them all in the same way when we do regularization: a high $$w1w_1$$ value should yield as much “penalty” as a high $$w3w_3$$ value.

But let’s say we have additional information: let’s say we only had 2 features originally: $$x1x_1$$ and $$x2x_2$$. A line was underfitting our training set and we wanted a more squiggly shaped decision boundary, so we constructed $$x3=x21x_3 = x_1^2$$ and $$x4=x32x_4 = x_2^3$$. Now we can have more complex models, but the more complex they get, the more we risk overfitting our model to the training data. So we want to strike a balance between minimizing the cost function and minimizing our model complexity. Well, the parameters that represent higher exponentials ($$x3x_3$$, $$x4x_4$$) are drastically increasing the complexity of our model. So shouldn’t we penalize more for high $$w3w_3$$, $$w4w_4$$ values than we penalize for high $$w1,w2w_1,w_2$$ values?