Estimating b1x1+b2x2b_1 x_1+b_2 x_2 instead of b1x1+b2x2+b3x3b_1 x_1+b_2 x_2+b_3x_3

I have a theoretical economic model which is as follows,

y=a+b1x1+b2x2+b3x3+u

So theory says that there are x1, x2 and x3 factors to estimate y.

Now I have the real data and I need to estimate b1, b2, b3. The problem is that the real data set contains only data for x1 and x2; there are no data for x3. So the model I can fit actually is:

y=a+b1x1+b2x2+u

  • Is it OK to estimate this model?
  • Do I lose anything estimating it?
  • If I do estimate b1, b2, then where does the b3x3 term go?
  • Is it accounted for by error term u?

And we would like to assume that x3 is not correlated with x1 and x2.

Answer

The issue you need to worry about is called endogeneity. More specifically, it depends on whether x3 is correlated in the population with x1 or x2. If it is, then the associated bjs will be biased. That is because OLS regression methods force the residuals, ui, to be uncorrelated with your covariates, xjs. However, your residuals are composed of some irreducible randomness, εi, and the unobserved (but relevant) variable, x3, which by stipulation is correlated with x1 and / or x2. On the other hand, if both x1 and x2 are uncorrelated with x3 in the population, then their bs won’t be biased by this (they may well be biased by something else, of course). One way econometricians try to deal with this issue is by using instrumental variables.

For the sake of greater clarity, I’ve written a quick simulation in R that demonstrates the sampling distribution of b2 is unbiased / centered on the true value of β2, when it is uncorrelated with x3. In the second run, however, note that x3 is uncorrelated with x1, but not x2. Not coincidentally, b1 is unbiased, but b2 is biased.

library(MASS)                          # you'll need this package below
N     = 100                            # this is how much data we'll use
beta0 = -71                            # these are the true values of the
beta1 = .84                            # parameters
beta2 = .64
beta3 = .34

############## uncorrelated version

b0VectU = vector(length=10000)         # these will store the parameter
b1VectU = vector(length=10000)         # estimates
b2VectU = vector(length=10000)
set.seed(7508)                         # this makes the simulation reproducible

for(i in 1:10000){                     # we'll do this 10k times
  x1 = rnorm(N)
  x2 = rnorm(N)                        # these variables are uncorrelated
  x3 = rnorm(N)
  y  = beta0 + beta1*x1 + beta2*x2 + beta3*x3 + rnorm(100)
  mod = lm(y~x1+x2)                    # note all 3 variables are relevant
                                       # but the model omits x3
  b0VectU[i] = coef(mod)[1]            # here I'm storing the estimates
  b1VectU[i] = coef(mod)[2]
  b2VectU[i] = coef(mod)[3]
}
mean(b0VectU)  # [1] -71.00005         # all 3 of these are centered on the
mean(b1VectU)  # [1] 0.8399306         # the true values / are unbiased
mean(b2VectU)  # [1] 0.6398391         # e.g., .64 = .64

############## correlated version

r23 = .7                               # this will be the correlation in the
b0VectC = vector(length=10000)         # population between x2 & x3
b1VectC = vector(length=10000)
b2VectC = vector(length=10000)
set.seed(2734)

for(i in 1:10000){
  x1 = rnorm(N)
  X  = mvrnorm(N, mu=c(0,0), Sigma=rbind(c(  1, r23),
                                         c(r23,   1)))
  x2 = X[,1]
  x3 = X[,2]                           # x3 is correated w/ x2, but not x1
  y  = beta0 + beta1*x1 + beta2*x2 + beta3*x3 + rnorm(100)
                                       # once again, all 3 variables are relevant
  mod = lm(y~x1+x2)                    # but the model omits x3
  b0VectC[i] = coef(mod)[1]
  b1VectC[i] = coef(mod)[2]            # we store the estimates again
  b2VectC[i] = coef(mod)[3]
}
mean(b0VectC)  # [1] -70.99916         # the 1st 2 are unbiased
mean(b1VectC)  # [1] 0.8409656         # but the sampling dist of x2 is biased
mean(b2VectC)  # [1] 0.8784184         # .88 not equal to .64

Attribution
Source : Link , Question Author : renathy , Answer Author : gung – Reinstate Monica

Leave a Comment