# Does standardising independent variables reduce collinearity?

I’ve come across a very good text on Bayes/MCMC. IT suggests that a standardisation of your independent variables will make an MCMC (Metropolis) algorithm more efficient, but also that it may reduce (multi)collinearity. Can that be true? Is this something I should be doing as standard.(Sorry).

Kruschke 2011, Doing Bayesian Data Analysis. (AP)

edit: for example

``````     > data(longley)
> cor.test(longley\$Unemployed, longley\$Armed.Forces)

Pearson's product-moment correlation

data:  longley\$Unemployed and longley\$Armed.Forces
t = -0.6745, df = 14, p-value = 0.5109
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.6187113  0.3489766
sample estimates:
cor
-0.1774206

> standardise <- function(x) {(x-mean(x))/sd(x)}
> cor.test(standardise(longley\$Unemployed), standardise(longley\$Armed.Forces))

Pearson's product-moment correlation

data:  standardise(longley\$Unemployed) and standardise(longley\$Armed.Forces)
t = -0.6745, df = 14, p-value = 0.5109
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.6187113  0.3489766
sample estimates:
cor
-0.1774206
``````

This hasn’t reduced the correlation or therefore the albeit limited linear dependence of vectors.

What’s going on?

R

It doesn’t change the collinearity between the main effects at all. Scaling doesn’t either. Any linear transform won’t do that. What it changes is the correlation between main effects and their interactions. Even if A and B are independent with a correlation of 0, the correlation between A, and A:B will be dependent upon scale factors.

Try the following in an R console. Note that `rnorm` just generates random samples from a normal distribution with population values you set, in this case 50 samples. The `scale` function standardizes the sample to a mean of 0 and SD of 1.

``````set.seed(1) # the samples will be controlled by setting the seed - you can try others
a <- rnorm(50, mean = 0, sd = 1)
b <- rnorm(50, mean = 0, sd = 1)
mean(a); mean(b)
#  0.1004483 # not the population mean, just a sample
#  0.1173265
cor(a ,b)
#  -0.03908718
``````

The incidental correlation is near 0 for these independent samples. Now normalize to mean of 0 and SD of 1.

``````a <- scale( a )
b <- scale( b )
cor(a, b)
# [1,] -0.03908718
``````

Again, this is the exact same value even though the mean is 0 and SD = 1 for both `a` and `b`.

``````cor(a, a*b)
# [1,] -0.01038144
``````

This is also very near 0. (a*b can be considered the interaction term)

However, usually the SD and mean of predictors differ quite a bit so let’s change `b`. Instead of taking a new sample I’ll rescale the original `b` to have a mean of 5 and SD of 2.

``````b <- b * 2 + 5
cor(a, b)
#  -0.03908718
``````

Again, that familiar correlation we’ve seen all along. The scaling is having no impact on the correlation between `a` and `b`. But!!

``````cor(a, a*b)
# [1,] 0.9290406
``````

Now that will have a substantial correlation which you can make go away by centring and/or standardizing. I generally go with just the centring.

EDIT: @Tim has an answer here that’s a bit more directly on topic. I didn’t have Kruschke at the time. The correlation between intercept and slope is similar to the issue of correlation with interactions though. They’re both about conditional relationships. The intercept is conditional on the slope; but unlike an interaction it’s one way because the slope is not conditional on the intercept. Regardless, if the slope varies so will the intercept unless the mean of the predictor is 0. Standardizing or centring the predictor variables will minimize the effect of the intercept changing with the slope because the mean will be at 0 and therefore the regression line will pivot at the y-axis and it’s slope will have no effect on the intercept.