# What is the interpretation of the covariance of regression coefficients?

The lm function in R can print out the estimated covariance of regression coefficients. What does this information give us? Can we now interpret the model better or diagnose issues that might be present in the model?

The most basic use of the covariance matrix is to obtain the standard errors of regression estimates. If the researcher is only interested in the standard errors of the individual regression parameters themselves, they can just take the square root of the diagonal to get the individual standard errors.

However, often times you may be interested in a linear combination of regression parameters. For example, if you have a indicator variable for a given group, you may be interested in the group mean, which would be

$\beta_0 + \beta_{\rm grp}$.

Then, to find the standard error for that group’s estimated mean, you would have

$\sqrt{X^\top S X}$,

where $X$ is a vector of your contrasts and $S$ is the covariance matrix. In our case, if we only have the addition covariate “grp”, then $X = (1,1)$ ($1$ for the intercept, $1$ for belonging to the group).

Furthermore, the covariance matrix (or more over, the correlation matrix, which is uniquely identified from the covariance matrix but not vice versa) can be very useful for certain model diagnostics. If two variables are highly correlated, one way to think about it is that the model is having trouble figuring out which variable is responsible for an effect (because they are so closely related). This can be helpful for a whole variety of cases, such as choosing subsets of covariates to use in a predictive model; if two variables are highly correlated, you may only want to use one of the two in your predictive model.