I was reading on the computation of the unbiased estimation of standard deviation and the source I read stated
(…) except in some important situations, the task has little relevance to
applications of statistics since its need is avoided by standard
procedures, such as the use of significance tests and confidence
intervals, or by using Bayesian analysis.
I was wondering if anyone could elucidate the reasoning behind this statement, for example doesn’t the confidence interval use the standard deviation as part of the calculation? Therefore, wouldn’t the confidence intervals be affected by a biased standard deviation?
Thanks for the answers so far, but I’m not quite sure I follow some of the reasoning for them so I’ll add a very simple example. The point is that if the source is correct, then then something is wrong from my conclusion to the example and I would like someone to point how how the p-value doesn’t depend on the standard deviation.
Suppose a researcher wished to test whether the mean score of fifth graders on a test in his or her city differed from the national mean of 76 with a significance level of 0.05. The researcher randomly sampled the scores of 20 students. The sample mean was 80.85 with a sample standard deviation of 8.87. This means: t = (80.85-76)/(8.87/sqrt(20)) = 2.44. A t-table is then used to calculate that the two-tailed probability value of a t of 2.44 with 19 df is 0.025. This is below our significance level of 0.05 so we reject the null hypothesis.
So in this example, wouldn’t the p-value (and maybe your conclusion) change depending on how you estimated your sample standard deviation?
I agree with Glen_b on this. Maybe I can add a few words to make the point even clearer. If data come from a normal distribution (iid situation) with an unknown variance the t statistic is the pivotal quantity used to generate confidence intervals and do hypothesis testing. The only thing that matters for that inference is its distribution under the null hypothesis (for determining the critical value) and under the alternative (to determine power and sample). Those are the central and noncentral t distributions, respectively. Now considering for a moment the one sample problem, the t test even has optimal properties as a test for the mean of a normal distribution. Now the sample variance is an unbiased estimator of the population variance but its square root is a BIASED estimator of the population standard deviation. It doesn’t matter that this BIASED estimator enters in the denominator of the pivotal quantity. Now it does play a role in that it is a consistent estimator. That is what allows the t distribution to approach the standard normal as the sample size goes to infinity. But being biased for any fixed $n$ does not affect the nice properties of the test.
In my opinion unbiasedness is overemphasized in introductory statistics classes. Accuracy and consistency of estimators are the real properties that deserve emphasis.
For other problems where parametric or nonparametric methods are applied, an estimate of standard deviation does not even enter into the formula.