I’ve read that model parameters become unstable in case of multicollinearity. Can some one give an example situation of this behavior, and explain why this is happening?

Please use the following multiple linear regression for illustration:

y=a1x1+a2x2+b

**Answer**

# What Is It?

Here is an example of this behavior. I’m going to write a function to simulate regressions and output their coefficients. We’ll look at the coordinate pair of coefficients (a1,a2) in the case of no collinearity and high collinearity. Here is some code:

```
library(tidyverse)
sim <- function(rho){
#Number of samples to draw
N = 50
#Make a covariance matrix
covar = matrix(c(1,rho, rho, 1), byrow = T, nrow = 2)
# Append a column of 1s to N draws from a 2-dimensional
# Gaussian
# With covariance matrix covar
X = cbind(rep(1,N),MASS::mvrnorm(N, mu = c(0,0),
Sigma = covar))
# True betas for our regression
betas = c(1,2,4)
# Make the outcome
y = X%*%betas + rnorm(N,0,1)
# Fit a linear model
model = lm(y ~ X[,2] + X[,3])
# Return a dataframe of the coefficients
return(tibble(a1 = coef(model)[2], a2 = coef(model)[3]))
}
#Run the function 1000 times and stack the results
zero_covar = rerun(1000, sim(0)) %>%
bind_rows
#Same as above, but the covariance in covar matrix
#is now non-zero
high_covar = rerun(1000, sim(0.95)) %>% bind_rows
#plot
zero_covar %>%
ggplot(aes(a1,a2)) +
geom_point(data = high_covar, color = 'red') +
geom_point()
```

Run that and you get something like

This simulation is supposed to simulate the sampling distribution of the coefficients. As we can see, in the case of no collinearity (black dots) the sampling distribution for the coefficients is very tight around the true value of (2,4). The blob is symmetric about this point.

In the case of high collinearity (red dots), the coefficients of the linear model can vary quite a lot! Instability in this case manifests as wildly different coefficient values given the same data generating process.

# Why Is This Happening

Let’s take a statistical perspective. The sampling distribution for the coefficients of a linear regression (with enough data) looks like

ˆβ∼N(β,Σ)

The covariance matrix for the above is

Σ=σ2(X′X)−1

Let’s focus for a minute on (X′X). If X has full rank, then (X′X) is a Gram Matrix, which has some special properties. One of those properties is that it has positive eigenvalues. That means we can decompose this matrix product according to eigenvalue decomposition.

(X′X)=QΛQ−1

Suppose now one of the columns of X is highly correlated with another column. Then, one of the eigenvalues of X′X should be close to 0 (I think). Inverting this product gives us

(X′X)−1=Q−1Λ−1Q

Since Λ is a diagonal matrix, Λ−1jj=1Λjj. If one of the eigenvalues is really small, then one of the elements of Λ−1 is really big, and so too is the covariance, leading to this instability in the coefficients.

I think I got that right, it has been a long time since I’ve done linear algebra.

**Attribution***Source : Link , Question Author : Eric Kim , Answer Author : kjetil b halvorsen*