Basic statistics courses often suggest using a normal distribution to estimate the mean of a population parameter when the sample size n is large (typically over 30 or 50). Student’s T-distribution is used for smaller sample sizes to account for the uncertainty in the standard deviation of the sample. When sample size is large, the sample standard deviation gives good information on the population standard deviation, allowing for a normal-distribution estimate. I get that.
But why use an estimate when you can get your confidence interval exactly? Regardless of sample size, what’s the point of using the normal distribution if it’s just an estimate of something you can get exactly with the T-distribution?
Just to clarify on relation to the title, we aren’t using the t-distribution to estimate the mean (in the sense of a point estimate at least), but to construct an interval for it.
But why use an estimate when you can get your confidence interval exactly?
It’s a good question (as long as we don’t get too insistent on ‘exactly’, since the assumptions for it to be exactly t-distributed won’t actually hold).
“You must use the t-distribution table when working problems when the population standard deviation (σ) is not known and the sample size is small (n<30)”
Why don’t people use the T-distribution all the time when the population standard deviation is not known (even when n>30)?
I regard the advice as – at best – potentially misleading. In some situations, the t-distribution should still be used when degrees of freedom are a good deal larger than that.
Where the normal is a reasonable approximation depends on a variety of things (and so depends on the situation). However, since (with computers) it’s not at all difficult to just use the $t$, even if the d.f. are very large, you’d have to wonder why the need to worry about doing something different at n=30.
If the sample sizes are really large, it won’t make a noticeable difference to a confidence interval, but I don’t think n=30 is always sufficiently close to ‘really large’.
There is one circumstance in which it might make sense to use the normal rather than the $t$ – that’s when your data clearly don’t satisfy the conditions to get a t-distribution, but you can still argue for approximate normality of the mean (if $n$ is quite large). However, in those circumstances, often the t is a good approximation in practice, and may be somewhat ‘safer’. [In a situation like that, I might be inclined to investigate via simulation.]