Are “random sample” and “iid random variable” synonyms?

I have been facing hard time understanding meaning of “random sample” as well as “iid random variable”. I tried to find out the meaning from several sources, but just got more and more confused. I am posting here what I tried and got to know:

Degroot’s Probability & Statistics says:

Random Samples / i.i.d. / Sample Size : Consider a given probability distribution on the real line that can be represented by either a p.f. or a p.d.f. f. It is said that n random variables X1,...,Xn form a random sample from this distribution if these random variables are independent and the marginal p.f. or p.d.f. of each of them is f. Such random variables are also said to be independent and identically distributed, abbreviated i.i.d. We refer to the number n of random variables as the sample size.

But one of the other statistics book I have says:

In a Random Sampling, we guarantee that every individual unit in the population gets an equal chance(probability) of being selected.

So, I have a feeling that i.i.d.s are elements that construct random sample, and the procedure to have random sample is random sampling. Am I right?

P.S.: I am very confused about this topic, so I will appreciate elaborate reply. Thanks.

Answer

You don’t say what the other statistics book is, but I’d guess that it is a
book (or section) about finite population sampling.

When you sample random variables, i.e. when you consider a set
X1,,Xn of n random variables, you know that if they are
independent, f(x1,,xn)=f(x1)f(xn), and identically distributed
, in particular E(Xi)=μ and Var(Xi)=σ2 for all i, then:
¯X=iXin,E(¯X)=μ,Var(¯X)=σ2n
where σ2 is the second central moment.

Sampling a finite population is somewhat different. If the population is of
size N, in sampling without replacement there are \binom{N}{n} possible
samples s_i of size n and they are equiprobable:
p(s_i)=\frac{1}{\binom{N}{n}}\quad\forall i=1,\dots,\binom{N}{n}
For example, if N=5 and n=3, the sample space is \{s_1,\dots,s_{10}\}
and the possibile samples are:
\begin{gather}s_1=\{1,2,3\},s_2=\{1,2,4\},s_3=\{1,2,5\},s_4=\{1,3,4\},s_5=\{1,3,5\},\\
s_6=\{1,4,5\},s_7=\{2,3,4\},s_8=\{2,3,5\},s_9=\{2,4,5\},s_{10}=\{3,4,5\}\end{gather}

If you count the number of occurences of each individual, you can see that
they are six, i.e. each individual has an equal chanche of being selected (6/10). So each s_i is a random sample according to the second definition. Roughly, it is not an i.i.d. random sample because individuals
are not random variables: you can consistently estimate E[X] by a sample mean but will
never know its exact value, but you can know the exact population mean if n=N (let me repeat: roughly.){}^1

Let \mu be some polulation mean (mean height, mean income, …). When n<N
you can estimate \mu like in random variable sampling:
\overline{y}_s=\sum_{i=1}^n y_i,\quad E(\overline{y}_s)=\mu
but the sample mean variance is different:
\text{Var}(\overline{y}_s)=\frac{\tilde\sigma^2}{n}\left(1-\frac{n}{N}\right)
where \tilde\sigma^2 is the population quasi-variance:
\frac{\sum_{i=1}^N(y_i-\overline{y})^2}{N-1}.
Factor (1-n/N) is usally called "finite population correction factor".

This is a quick example of how a (random variable) i.i.d. random sample and a
(finite population) random sample may differ. Statistical
inference
is mainly about
random variable sampling, sampling
theory
is about finite
population sampling.


{}^1 Say you are manufacturing light bulbs and wish to know their average life
span. Your "population" is just a theoretical or virtual one, at least if you
keep manufacturing light bulbs. So you have to model a data generation
process
and intepret a set of light bulbs as a (random variable) sample. Say
now that you find a box of 1000 light bulbs and wish to know their average
life span. You can select a small set of light bulbs (a finite population
sample), but you could select all of them. If you select a small sample, this
doesn't transform light bulbs into random variables: the random variable is
generated by you, as the choice between "all" and "a small set" is up to
you. However, when a finite population is very large (say your country
population), when choosing "all" is not viable, the second situation is better
handled as the first one.

Attribution
Source : Link , Question Author : Silent , Answer Author : whuber

Leave a Comment