I did linear regression with and without regularization parameter (ridge) and found that regularization improves the regression accuracy for just some of my test data and error goes up for the rest. So it seems that regularization is destructive in some cases. So I am thinking to make ridge parameter as a function of “something” … Read more

## Relevance of ridge regression in very large datasets

Ridge regression shrinks regression coefficients by a set proportion, and never to zero. Given this, its key benefit seems to be to reduce prediction error by decreasing the sampling variability of the parameter vector, at the cost of some bias. Given data sets of 100,000 records or larger, the sampling variability of the parameter error … Read more

## Bayesian “confidence intervals” for non-spline ridge regression?

Wahba (1983) and Silverman (1985) show that the quadratic penalty term on a smoothing spline is akin to a bayesian prior on the smoothness of the model. Nychka (1988) is another key reference. This is made a little less-arcane by Simon Wood (see section 4.8). Consider the model $$y = \mathbf{X\beta} + \epsilon$$ … Read more

## Clarifying Dual Representation

$$w = (X^T X + \lambda I_n)^{-1} X^T y \rightarrow \sum^m_{i=1} \alpha_i x_i \\ \alpha = (\alpha_1, …, \alpha_m)^T \rightarrow (XX^T + \lambda I_m)^{-1} y$$ From what I understand, the above demonstrates the conversion from primal to dual form. However, I am at a loss as to the definition and meaning of each … Read more

## Ridge regression using stochastic gradient descent in Python

I am trying to implement a solution to Ridge regression in Python using Stochastic gradient descent as the solver. My code for SGD is as follows: def fit(self, X, Y): # Convert to data frame in case X is numpy matrix X = pd.DataFrame(X) # Prepend a column of 1s to the data for the … Read more

## Poor regression results with LASSO which improves after variable selection – could someone shed some light on the observations?

This is my first post here – I’m not a statistician by training though I have a machine learning background, so please correct any erroneous usage of statistical terminology if you see any. Currently, I have a (linear) regression problem I’m trying to solve – ~5000 predictors and ~200 examples which is an extremely ill-posed … Read more

## Variable importance in cases of multicollinearity: OLS vs ridge regression

I have read that when using Ordinary Least Squares (OLS) for multiple linear regression, the coefficients/weights are unreliable for predictor variables that are collinear. I was wondering if this is also the case for regularisation methods (ridge/lasso/elastic net regression) when variables are collinear? Or could the coefficients/weights be used to determine relative importance of the … Read more

## Large ridge regression

The following is a question pertaining to a large scale ridge regression. I am stumped by this question, any one have an idea? Thanks Suppose the data for the ridge regression problem becomes available sequentially, i.e. the kth data point xk arrives at time tk. At time tk we want to be able to compute … Read more

## Linear regression on circulant matrices

I am reading the paper on High-Speed Tracking with Kernelized Correlation Filters and I am a bit stuck on the equivalence of Ridge regression in the frequency domain. Minimizing the typical equation for Ridge regression is min This can be solved with the closed form solution \textbf{w} = \left ( \textbf{X}^T \textbf{X} + \lambda \textbf{I}\right … Read more

## Is it possible that ridge logistic regression will also reduce coefficients to exactly zero? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 3 years ago. Improve this question I have 105 predictors which contain dummy, numerical, and nominal variables. The output variable is dichotomous. I ran ridge logistic … Read more