Let \{X_i\}_{i=1}^n be a family of i.i.d. random variables taking values in [0,1], having a mean \mu and variance \sigma^2. A simple confidence interval for the mean, using \sigma whenever it is known, is given by

P( | \bar X – \mu| > \varepsilon) \le \frac{\sigma^2}{n\varepsilon^2} \le\frac{1}{n \varepsilon^2} \qquad (1).

Also, because \frac{\bar X- \mu}{\sigma/\sqrt{n}} is asymptotically distributed as a standard normal random variable, the normal distribution is sometimes used to “construct” an approximate confidence interval.

In multiple-choice answer statistics exams, I’ve had to use this approximation instead of (1) whenever n \geq 30. I’ve always felt very uncomfortable with this (more than you can imagine), as the approximation error is not quantified.

Why use the normal approximation rather than (1)?

I don’t want, ever again, to blindly apply the rule n \geq 30. Are there good references that can support me in a refusal to do so and provide appropriate alternatives? ((1) is an example of what I consider an appropriate alternative.)

Here, while \sigma and E[ |X|^3] are unknown, they are easily bounded.

Please note that my question is a

reference requestparticularly about confidence intervals and therefore is distinct from the differs from the questions that were suggested as partial duplicates here and here. It is not answered there.

**Answer**

Why use normal approximation?

It’s as simple as saying that it’s always better to use more information than less. The equation (1) uses Chebyshev’s theorem. Note, how it doesn’t use any information about your distribution’s shape, i.e. it works for any distribution with a given variance. Hence, if you use some information about your distribution’s shape you must get a better approximation. If you knew that your distribution is Gaussian, then by using this knowledge you get a better estimate.

Since, you’re already applying the central limit theorem, why not use the Gaussian approximation of the bounds? They’re going to be better, actually, tighter (or sharper) because these estimates are based on the knowledge of the shape which is an additional piece of information.

The rule of thumb 30 is a myth, which benefits from the confirmation bias. It just keeps being copied from one book to another. Once I found a reference suggesting this rule in a paper in 1950s. It wasn’t any kind of solid proof, as I recall. It was some sort of empirical study. Basically, the only reason it’s used is because it sort of works. You don’t see it violated badly often.

UPDATE

Look up the paper by Zachary R. Smith and Craig S. Wells “Central Limit Theorem and Sample Size“. They present an empirical study of the convergence to CLT for different kinds of distributions. The magic number 30 doesn’t work in many cases, of course.

**Attribution***Source : Link , Question Author : Olivier , Answer Author : Aksakal*