I am trying to duplicate the results from
sklearn
logistic regression library usingglmnet
package in R.From the
sklearn
logistic regression documentation, it is trying to minimize the cost function under l2 penalty
minw,c12wTw+CN∑i=1log(exp(−yi(XTiw+c))+1)From the vignettes of
glmnet
, its implementation minimizes a slightly different cost function
\min_{\beta, \beta_0} -\left[\frac1N \sum_{i=1}^N y_i(\beta_0+x_i^T\beta)-\log(1+e^{(\beta_0+x_i^T\beta)})\right] + \lambda[(\alpha-1)||\beta||_2^2/2+\alpha||\beta||_1]With some tweak in the second equation, and by setting \alpha=0, \lambda\min_{\beta, \beta_0} \frac1{N\lambda} \sum_{i=1}^N \left[-y_i(\beta_0+x_i^T\beta)+\log(1+e^{(\beta_0+x_i^T\beta)})\right] + ||\beta||_2^2/2
which differs from
sklearn
cost function only by a factor of \lambda if set \frac1{N\lambda}=C, so I was expecting the same coefficient estimation from the two packages. But they are different. I am using the dataset from UCLA idre tutorial, predictingadmit
based ongre
,gpa
andrank
. There are 400 observations, so with C=1, \lambda = 0.0025.#python sklearn df = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv") y, X = dmatrices('admit ~ gre + gpa + C(rank)', df, return_type = 'dataframe') X.head() > Intercept C(rank)[T.2] C(rank)[T.3] C(rank)[T.4] gre gpa 0 1 0 1 0 380 3.61 1 1 0 1 0 660 3.67 2 1 0 0 0 800 4.00 3 1 0 0 1 640 3.19 4 1 0 0 1 520 2.93 model = LogisticRegression(fit_intercept = False, C = 1) mdl = model.fit(X, y) model.coef_ > array([[-1.35417783, -0.71628751, -1.26038726, -1.49762706, 0.00169198, 0.13992661]]) # corresponding to predictors [Intercept, rank_2, rank_3, rank_4, gre, gpa] > # R glmnet > df = fread("https://stats.idre.ucla.edu/stat/data/binary.csv") > X = as.matrix(model.matrix(admit~gre+gpa+as.factor(rank), data=df))[,2:6] > y = df[, admit] > mylogit <- glmnet(X, y, family = "binomial", alpha = 0) > coef(mylogit, s = 0.0025) 6 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) -3.984226893 gre 0.002216795 gpa 0.772048342 as.factor(rank)2 -0.530731081 as.factor(rank)3 -1.164306231 as.factor(rank)4 -1.354160642
The
R
output is somehow close to logistic regression without regularization, as can be seen here. Am I missing something or doing something obviously wrong?Update: I also tried to use
LiblineaR
package inR
to conduct the same process, and yet got another different set of estimates (liblinear
is also the solver insklearn
):> fit = LiblineaR(X, y, type = 0, cost = 1) > print(fit) $TypeDetail [1] "L2-regularized logistic regression primal (L2R_LR)" $Type [1] 0 $W gre gpa as.factor(rank)2 as.factor(rank)3 as.factor(rank)4 Bias [1,] 0.00113215 7.321421e-06 5.354841e-07 1.353818e-06 9.59564e-07 2.395513e-06
Update 2: turning off standardization in
glmnet
gives:> mylogit <- glmnet(X, y, family = "binomial", alpha = 0, standardize = F) > coef(mylogit, s = 0.0025) 6 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) -2.8180677693 gre 0.0034434192 gpa 0.0001882333 as.factor(rank)2 0.0001268816 as.factor(rank)3 -0.0002259491 as.factor(rank)4 -0.0002028832
Answer
Dougal’s answer is correct, you regularize the intercept in sklearn
but not in R. Make sure you use solver='newton-cg'
since default solver ('liblinear'
) always regularizes the intercept.
Attribution
Source : Link , Question Author : hurrikale , Answer Author : TomDLT