Linear regression model that best suit for data with errors

I’m looking for linear regression algorithm that is most suitable for a data whose independent variable (x) has a constant measurement error and the dependent variable (y) has signal dependent error.

enter image description here

The above image illustrates my question.

Answer

Measurement error in the dependent variable

Given a general linear model
y=β0+β1x1++βkxk+ε
with ε homosckedastic, not autocorrelated and uncorrelated with the independent variables, let y denote the “true” variable, and y its observable measure. The measurement error is defined as their difference
e=yy
Thus, the estimable model is:
y=β0+β1x1++βkxk+e+ε
Since y,x1,,xk are observed, we can estimate the model by OLS. If the measurement error in y is statistically independent of each explanatory variable, then (e+ε) shares the same properties as ε and the usual OLS inference procedures (t statistics, etc.) are valid. However, in your case I’d expect an increasing variance of e. You could use:

  • a weighted least squares estimator (e.g. Kutner et al., §11.1; Verbeek, §4.3.1-3);

  • the OLS estimator, which is still unbiased and consistent, and heteroskedasticity-consistent standard errors, or simply Wite standard errors (Verbeek, §4.3.4).

Measurement error in the independent variable

Given the same linear model as above, let xk denote the “true” value and xk its observable measure. The measurement error is now:
ek=xkxk
There are two main situations (Wooldridge, §4.4.2).

  • Cov(xk,ek)=0: the measurement error is uncorrelated with the observed measure and must therefore be correlated with the unobserved variable xk; writing xk=xkek and plugging this into (1):
    y=β0+β1x1++βkxk+(εβkek)
    since ε and e both are uncorrelated with each xj, including xk, measurement just increases the error variance and violates none of the OLS assumptions;

  • Cov(xk,ηk)=0: the measurement error is uncorrelated with the unobserved variable and must therefore be correlated with the observed measure xk; such a correlation causes prolems and the OLS regression of y on x1,,xk generally gives biased and unconsitent estimators.

As far as I can guess by looking at your plot (errors centered on the “true” values of the independent variable), the first scenario could apply.

Attribution
Source : Link , Question Author : user46178 , Answer Author : Sergio

Leave a Comment