If I repeat every sample observation in a linear regression model and rerun the regression how would the result be affected?

Say I have N observations, possibly multiple factors and I repeat each observation twice (or M times) how would a regression on this new set of size NM compare to a regression on just the original observations?

Conceptually, you are adding no “new” information, but you “know” that information more precisely.

This would therefore result in the same regression coefficients, with smaller standard errors.

For example, in Stata, the expand x function duplicates each observation x times.

``````sysuse auto, clear
regress mpg weight length
------------------------------------------------------------------------------
mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight |  -.0038515    .001586    -2.43   0.018    -.0070138   -.0006891
length |  -.0795935   .0553577    -1.44   0.155    -.1899736    .0307867
_cons |   47.88487    6.08787     7.87   0.000       35.746    60.02374
------------------------------------------------------------------------------

expand 5

regress mpg weight length
------------------------------------------------------------------------------
mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight |  -.0038515   .0006976    -5.52   0.000    -.0052232   -.0024797
length |  -.0795935   .0243486    -3.27   0.001    -.1274738   -.0317131
_cons |   47.88487   2.677698    17.88   0.000     42.61932    53.15043
------------------------------------------------------------------------------
``````

As you can see, formerly insignifcant coefficients (length) become statistically significant in the expanded model, representing the precision with which you “know” what you know.