# What are the consequences of “copying” a data set for OLS?

Suppose I have a random sample $\lbrace X_i, Y_i\rbrace_{i=1}^n$. Assume this sample is such that the Gauss-Markov assumptions are satisfied such that I can construct an OLS estimator where

Now suppose I take my data set and double it, meaning there is an exact copy for each of the $n$ $(X_i,Y_i)$ pairs.

# My Question

How does this affect my ability to use OLS? Is it still consistent and identified?

$$Y=Xβ+E, \DeclareMathOperator{\V}{\mathbb{V}} Y = X \beta + E,$$
the least square estimator is $$ˆβols=(XTX)−1XTY\hat{\beta}_{\text{ols}} = (X^T X)^{-1} X^T Y$$ and the variance matrix is $$Vˆβols=σ2(XtX)−1 \V \hat{\beta}_{\text{ols}}= \sigma^2 (X^t X)^{-1}$$. “Doubling the data” means that $$YY$$ is replaced by $$(YY)\begin{pmatrix} Y \\ Y \end{pmatrix}$$ and $$XX$$ is replaced by $$(XX)\begin{pmatrix} X \\ X \end{pmatrix}$$. The ordinary least squares estimator then becomes
$$((XX)T(XX))−1(XX)T(YY)=(xTX+XTX)−1(XTY+XTY)=(2XTX)−12XTY=ˆβols \left(\begin{pmatrix}X \\ X \end{pmatrix}^T \begin{pmatrix} X \\ X \end{pmatrix} \right )^{-1} \begin{pmatrix} X \\ X \end{pmatrix}^T \begin{pmatrix} Y \\ Y \end{pmatrix} = \\ (x^T X + X^T X)^{-1} (X^T Y + X^T Y ) = (2 X^T X)^{-1} 2 X^T Y = \\ \hat{\beta}_{\text{ols}}$$
so the calculated estimator doesn’t change at all. But the calculated variance matrix becomes wrong: Using the same kind of algebra as above, we get the variance matrix $$σ22(XTX)−1\frac{\sigma^2}{2}(X^T X)^{-1}$$, half of the correct value. A consequence is that confidence intervals will shrink with a factor of $$1√2\frac{1}{\sqrt{2}}$$.
The reason is that we have calculated as if we still have iid data, which is untrue: the pair of doubled values obviously have a correlation equal to $$1.01.0$$. If we take this into account and use weighted least squares correctly, we will find the correct variance matrix.