Let’s have some linear model, for example just simple ANOVA:

`# data generation set.seed(1.234) Ng <- c(41, 37, 42) data <- rnorm(sum(Ng), mean = rep(c(-1, 0, 1), Ng), sd = 1) fact <- as.factor(rep(LETTERS[1:3], Ng)) m1 = lm(data ~ 0 + fact) summary(m1)`

Result is as follows:

`Call: lm(formula = data ~ 0 + fact) Residuals: Min 1Q Median 3Q Max -2.30047 -0.60414 -0.04078 0.54316 2.25323 Coefficients: Estimate Std. Error t value Pr(>|t|) factA -0.9142 0.1388 -6.588 1.34e-09 *** factB 0.1484 0.1461 1.016 0.312 factC 1.0990 0.1371 8.015 9.25e-13 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.8886 on 117 degrees of freedom Multiple R-squared: 0.4816, Adjusted R-squared: 0.4683 F-statistic: 36.23 on 3 and 117 DF, p-value: < 2.2e-16`

Now I try two different methods to estimate confidence interval of these parameters

`c = coef(summary(m1)) # 1st method: CI limits from SE, assuming normal distribution cbind(low = c[,1] - qnorm(p = 0.975) * c[,2], high = c[,1] + qnorm(p = 0.975) * c[,2]) # 2nd method confint(m1)`

## Questions:

- What is the distribution of estimated linear regression coefficients? Normal or $t$?
- Why do both methods yield different results? Assuming normal distribution and correct SE, I’d expect both methods to have the same result.
Thank you very much!

data ~ 0 + fact

EDIT after an answer:The answer is exact, this will give exactly the same result as

`confint(m1)`

!`# 3rd method cbind(low = c[,1] - qt(p = 0.975, df = sum(Ng) - 3) * c[,2], high = c[,1] + qt(p = 0.975, df = sum(Ng) - 3) * c[,2])`

**Answer**

**(1)** When the errors are normally distributed and their variance is *not* known, then $$\frac{\hat{\beta} – \beta_0}{{\rm se}(\hat{\beta})}$$ has a $t$-distribution under the null hypothesis that $\beta_0$ is the true regression coefficient. The default in `R`

is to test $\beta_0 = 0$, so the $t$-statistics reported there are just $$\frac{\hat{\beta}}{{\rm se}(\hat{\beta})}$$

Note that, under some regularity conditions, the statistic above is always *asymptotically* normally distributed, regardless of whether the errors are normal or whether the error variance is known.

**(2)** The reason you’re getting different results is that the percentiles of the normal distribution are different from the percentiles of the $t$-distribution. Therefore, the multiplier you’re using in front of the standard error is different, which, in turn gives different confidence intervals.

Specifically, recall that the confidence interval using the normal distribution is

$$ \hat{\beta} \pm z_{\alpha/2} \cdot {\rm se}(\hat{\beta}) $$

where $z_{\alpha/2}$ is the $\alpha/2$ quantile of the normal distribution. In the standard case of a $95\%$ confidence interval, $\alpha = .05$ and $z_{\alpha/2} \approx 1.96$. The confidence interval based on the $t$-distribution is

$$ \hat{\beta} \pm t_{\alpha/2,n-p} \cdot {\rm se}(\hat{\beta}) $$

where the multiplier $t_{\alpha/2,n-p}$ is based on the quantiles of the $t$-distribution with $n-p$ degrees of freedom where $n$ is the sample size and $p$ is the number of predictors. When $n$ is large, $t_{\alpha/2,n-p}$ and $z_{\alpha/2}$ are about the same.

Below is a plot of the $t$ multipliers for sample sizes ranging from $5$ to $300$ (I’ve assumed $p=1$ for this plot, but that qualitatively changes nothing). The $t$-multipliers are larger, but, as you can see below, they do converge to the $z$ (solid black line) multiplier as the sample size increases.

**Attribution***Source : Link , Question Author : Tomas , Answer Author : Macro*