# How does the sampling distribution of sample means approximate the population mean?

I am trying to learn statistics because I find that it is so prevalent that it prohibits me from learning some things if I don’t understand it properly. I am having trouble understanding this notion of a sampling distribution of the sample means. I can’t understand the way some books and sites have explained it. I think I have an understanding but am unsure if its correct. Below is my attempt to understand it.

When we talk about some phenomenon taking on a normal distribution, it is generally (not always) concerning the population.

We want to use inferential statistics to predict some stuff about some population, but don’t have all the data. We use random sampling and each sample of size n is equally as likely to be selected.

So we take lots of samples, lets say 100 and then the distribution of the means of those samples will be approximately normal according to the central limit theorem. The mean of the sample means will approximate the population mean.

Now what I don’t understand is a lot of the times you see “A sample of 100 people…” Wouldn’t we need 10s or 100s of samples of 100 people to approximate the population of the mean? Or is it the case that we can take a single sample that’s large enough, say 1000 and then say that mean will approximate the population mean? OR do we take a sample of 1000 people and then take 100 random samples of 100 people in each sample from that original 1000 people we took and then use that as our approximation?

Does taking a large enough sample to approximate the mean (almost) always work? Does the population even need to be normal for this to work?