Say I have N observations, possibly multiple factors and I repeat each observation twice (or M times) how would a regression on this new set of size NM compare to a regression on just the original observations?

**Answer**

Conceptually, you are adding no “new” information, but you “know” that information more precisely.

This would therefore result in the same regression coefficients, with smaller standard errors.

For example, in Stata, the **expand x** function duplicates each observation **x** times.

```
sysuse auto, clear
regress mpg weight length
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0038515 .001586 -2.43 0.018 -.0070138 -.0006891
length | -.0795935 .0553577 -1.44 0.155 -.1899736 .0307867
_cons | 47.88487 6.08787 7.87 0.000 35.746 60.02374
------------------------------------------------------------------------------
expand 5
regress mpg weight length
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0038515 .0006976 -5.52 0.000 -.0052232 -.0024797
length | -.0795935 .0243486 -3.27 0.001 -.1274738 -.0317131
_cons | 47.88487 2.677698 17.88 0.000 42.61932 53.15043
------------------------------------------------------------------------------
```

As you can see, formerly insignifcant coefficients (length) become statistically significant in the expanded model, representing the precision with which you “know” what you know.

**Attribution***Source : Link , Question Author : Palace Chan , Answer Author : pmgjones*