# Why do I get same results for OLS and GLS in R?

When I run this code:

``````require(nlme)

a <- matrix(c(1,3,5,7,4,5,6,4,7,8,9))

b <- matrix(c(3,5,6,2,4,6,7,8,7,8,9))

res <- lm(a ~ b)

print(summary(res))

res_gls <- gls(a ~ b)

print(summary(res_gls))
``````

I get the same coefficients and the same statistical significance on the coefficients:

``````Loading required package: nlme

Call:
lm(formula = a ~ b)

Residuals:
Min      1Q  Median      3Q     Max
-2.7361 -1.1348 -0.2955  1.2463  3.8234

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.0576     1.8732   1.098   0.3005
b             0.5595     0.2986   1.874   0.0937 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.088 on 9 degrees of freedom
Multiple R-squared: 0.2807, Adjusted R-squared: 0.2007
F-statistic: 3.512 on 1 and 9 DF,  p-value: 0.09371

Generalized least squares fit by REML
Model: a ~ b
Data: NULL
AIC      BIC    logLik
51.0801 51.67177 -22.54005

Coefficients:
Value Std.Error  t-value p-value
(Intercept) 2.0576208 1.8731573 1.098477  0.3005
b           0.5594796 0.2985566 1.873948  0.0937

Correlation:
(Intr)
b -0.942

Standardized residuals:
Min         Q1        Med         Q3        Max
-1.3104006 -0.5434780 -0.1415446  0.5968911  1.8311781

Residual standard error: 2.087956
Degrees of freedom: 11 total; 9 residual
``````

Why is this happening? In what cases do OLS estimates are the same as GLS estimates?

You got the same results because you didn’t specify a special variance or correlation structure in the `gls` function. Without such options, a GLS behaves like a OLS. The advantage of a GLS model over a normal regression is the ability to specify a correlation structure (option `correlation`) or allowing the residual variance to differ (option `weights`). Let me show this with an example.

``````library(nlme)

set.seed(1500)

x <- rnorm(10000,100,12) # generate x with arbitrary values

y1 <- 10 + 15*x + rnorm(10000,0,5) # the first half of the dataset

y2 <-  -2 - 5*x + rnorm(10000,0,15) # the 2nd half of the data set with 3 times larger residual SD (15 vs. 5)

y <- c(y1, y2)
x.new <- c(x, x)

dummy.var <- c(rep(0, length(y1)), rep(1, length(y2))) # dummy variable to distinguish the first half of the dataset (y1) from the second (y2)

# Calculate a normal regression model

lm.mod <- lm(y~x.new*dummy.var)

summary(lm.mod)

Coefficients:
Estimate Std. Error   t value Pr(>|t|)
(Intercept)      10.27215    0.94237    10.900   <2e-16 ***
x.new            14.99691    0.00935  1603.886   <2e-16 ***
dummy.var       -12.07076    1.33272    -9.057   <2e-16 ***
x.new:dummy.var -19.99891    0.01322 -1512.387   <2e-16 ***

# Calculate a GLS without any options

gls.mod.1 <- gls(y~x.new*dummy.var)

summary(gls.mod.1)

Coefficients:
Value Std.Error    t-value p-value
(Intercept)      10.27215 0.9423749    10.9003       0
x.new            14.99691 0.0093504  1603.8857       0
dummy.var       -12.07076 1.3327194    -9.0572       0
x.new:dummy.var -19.99891 0.0132234 -1512.3868       0

# GLS again, but allowing different residual variance for y1 and y2

gls.mod.2 <- gls(y~x.new*dummy.var, weights=varIdent(form=~1|dummy.var))

summary(gls.mod.2)

Parameter estimates:
0        1
1.000000 2.962565

Coefficients:
Value Std.Error   t-value p-value
(Intercept)      10.27215 0.4262268    24.100       0
x.new            14.99691 0.0042291  3546.144       0
dummy.var       -12.07076 1.3327202    -9.057       0
x.new:dummy.var -19.99891 0.0132234 -1512.386       0

# Perform a likelihood ratio test

anova(gls.mod.1, gls.mod.2)

Model df      AIC      BIC    logLik   Test  L.Ratio p-value
gls.mod.1     1  5 153319.4 153358.9 -76654.69
gls.mod.2     2  6 143307.2 143354.6 -71647.61 1 vs 2 10014.15  <.0001
``````

The first GLS model (`gls.mod.1`) and the normal linear regression model (`lm.mod`) yield exactly the same results. The GLS model allowing for different residual standard deviations (`gls.mod.2`) estimates the residual SD of `y2` to be around 3 times larger than the residual SD of `y1` which is exactly what we specified when we generated the data. The regression coefficients are practically the same, but the standard errors have changed. The likelihood ratio test (and AIC) suggests that the GLS model with the different residual variances (`gls.mod.2`) fits the data better than the normal model (`lm.mod` or `gls.mod.1`).

Variance and correlation structures in `gls`

You can specify several variance structures in the `gls` function and the option `weights`. See here for a list. For a list of correlation structures for the option `correlation` see here.