Invariance of results when scaling explanatory variables in logistic regression, is there a proof?

There is a standard result for linear regression that the regression coefficients are given by

β=(XTX)1XTy

or

(XTX)β=XTy

Scaling the explanatory variables does not affect the predictions. I have tried to show this algebraically as follows.

The response is related to the explanatory variables via the matrix equation
y=Xβ

X is an n×(p+1) matrix of n observations on p explanatory variables. The first column of X is a column of ones.

Scaling the explanatory variables with a (p+1)×(p+1) diagonal matrix D, whose entries are the scaling factors
Xs=XD

Xs and βs satisfy (2):

(DTXTXD)βs=DTXTy

so

XTXDβs=XTy

Dβs=(XTX)1XTy=β

βs=D1β

This means if an explanatory variable is scaled by di then the regression coefficient βiis scaled by 1/di and the effect of the scaling cancels out, i.e.
considering predictions based on scaled values, and using (4),(5),(3)

ys=Xsβs=XDD1β=Xβ=y
as expected.

Now to the question.

For logistic regression without any regularization, it is suggested, by doing regressions with and without scaling the same effect is seen

fit <- glm(vs ~ mpg, data=mtcars, family=binomial)

print(fit)

Coefficients:
(Intercept)          mpg  
    -8.8331       0.4304  
mtcars$mpg <- mtcars$mpg * 10

fit <- glm(vs ~ mpg, data=mtcars, family=binomial)

print(fit)

Coefficients:
(Intercept)          mpg  
   -8.83307      0.04304  

When the variable mpg is scaled up by 10, the corresponding coefficient is scaled down by 10.

  1. How could this scaling property be proved (or disproved ) algebraically for logistic regression?

I found a similar question relating to the effect on AUC when regularization is used.

  1. Is there any point to scaling explanatory variables in logistic regression, in the absence of regularization?

Answer

Here is a heuristic idea:

The likelihood for a logistic regression model is
(β|y)i(exp(xiβ)1+exp(xiβ))yi(11+exp(xiβ))1yi
and the MLE is the arg max of that likelihood. When you scale a regressor, you also need to accordingly scale the coefficients to achieve the original maximal likelihood.

Attribution
Source : Link , Question Author : PM. , Answer Author : Christoph Hanck

Leave a Comment