# Large sample asymptotic/theory – Why to care about?

I hope that this question does not get marked “as too general” and hope a discussion gets started that benefits all.

In statistics, we spend a lot of time learning large sample theories. We are deeply interested in assessing asymptotic properties of our estimators including whether they are asymptotically unbiased, asymptotically efficient, their asymptotic distribution and so on. The word asymptotic is strongly tied with the assumption that $n \rightarrow \infty$.

In reality, however, we always deal with finite $n$. My questions are:

1) what do we mean by large sample? How can we distinguish between small and large samples?

2) When we say $n \rightarrow \infty$, do we literally mean that $n$ should go to $\infty$?

e.x. for binomial distribution, $\bar{X}$ needs about n = 30 to converge to normal distribution under CLT. Should we have $n \rightarrow \infty$ or in this case by $\infty$ we mean 30 or more?!

3) Suppose we have a finite sample and suppose that We know everything about asymptotic behavior of our estimators. So what? suppose that our estimators are asymptotically unbiased, then do we have an unbiased estimate for our parameter of interest in our finite sample or it means that if we had $n \rightarrow \infty$, then we would have an unbiased one?

As you can see from the questions above, I’m trying to understand the philosophy behind “Large Sample Asymptotics” and to learn why we care? I need to get some intuitions for the theorems I’m learning.

Better late than never. Let me first list three (I think important) reasons why we focus on asymptotic unbiasedness (consistency) of estimators.

a) Consistency is a minimum criterion. If an estimator doesn’t correctly estimate even with lots of data, then what good is it? This is the justification given in Wooldridge: Introductory Econometrics.

b) Finite sample properties are much harder to prove (or rather, asymptotic statements are easier). I am currently doing some research myself, and whenever you can rely on large sample tools, things get much easier. Laws of large numbers, martingale convergence theorems etc. are nice tools for getting asymptotic results, but don’t help with finite samples. I believe something along these lines is mentioned in Hayashi (2000): Econometrics.

c) If estimators are biased for small samples, one can potentially correct or at least improve with so called small sample corrections. These are often complicated theoretically (to prove they improve on the estimator without the correction). Plus, most people are fine with relying on large samples, so small sample corrections are often not implemented in standard statistics software, because only few people require them (those that can’t get more data AND care about unbiasedness). Thus, there are certain barriers to using those uncommon corrections.

On your questions. What do we mean by “large sample”? This depends heavily on the context, and for specific tools it can be answered via simulation. That is, you artificially generate data, and see how, say, the rejection rate behaves as a function of sample size, or the bias behaves as a function of sample size. A specific example is here, where the authors see how many clusters it takes for OLS clustered standard errors, block bootstraped standard errors etc. to perform well. Some theorists also have statements on the rate of convergence, but for practical purposes the simulations appear to be more informative.

Does it really take $n\to \infty$? If that’s what the theory says, yes, but in application we can accept small, negligible bias, which we have with sufficiently large sample sizes with high probability. What sufficiently means depends on the context, see above.

On question 3: usually, the question of unbiasedness (for all sample sizes) and consistency (unbiasedness for large samples) is considered separately. An estimator can be biased, but consistent, in which case indeed only the large sample estimates are unbiased. But there are also estimators that are unbiased and consistent, which are theoretically applicable for any sample size. (An estimator can also be unbiased but inconsistent for technical reasons.)