I have been facing hard time understanding meaning of “random sample” as well as “iid random variable”. I tried to find out the meaning from several sources, but just got more and more confused. I am posting here what I tried and got to know:

Degroot’s Probability & Statistics says:

Random Samples / i.i.d. / Sample Size : Consider a given probability distribution on the real line that can be represented by either a p.f. or a p.d.f. f. It is said that n random variables X1,...,Xn form a random sample from this distribution if these random variables are independent and the marginal p.f. or p.d.f. of each of them is f. Such random variables are also said to be independent and identically distributed, abbreviated i.i.d. We refer to the number n of random variables as the sample size.

But one of the other statistics book I have says:

In a Random Sampling, we guarantee that every individual unit in the population gets an equal chance(probability) of being selected.

So, I have a feeling that i.i.d.s are elements that construct random sample, and the procedure to have random sample is random sampling. Am I right?

P.S.: I am very confused about this topic, so I will appreciate elaborate reply. Thanks.

**Answer**

You don’t say what the other statistics book is, but I’d guess that it is a

book (or section) about *finite population sampling*.

When you sample random variables, i.e. when you consider a set

X1,…,Xn of n random variables, you know that *if they are
independent, f(x1,…,xn)=f(x1)⋯f(xn), and identically distributed*, in particular E(Xi)=μ and Var(Xi)=σ2 for all i, then:

¯X=∑iXin,E(¯X)=μ,Var(¯X)=σ2n

where σ2 is the second central moment.

Sampling a finite population is somewhat different. If the population is of

size N, in sampling without replacement there are \binom{N}{n} possible

samples s_i of size n and they are equiprobable:

p(s_i)=\frac{1}{\binom{N}{n}}\quad\forall i=1,\dots,\binom{N}{n}

For example, if N=5 and n=3, the sample space is \{s_1,\dots,s_{10}\}

and the possibile samples are:

\begin{gather}s_1=\{1,2,3\},s_2=\{1,2,4\},s_3=\{1,2,5\},s_4=\{1,3,4\},s_5=\{1,3,5\},\\

s_6=\{1,4,5\},s_7=\{2,3,4\},s_8=\{2,3,5\},s_9=\{2,4,5\},s_{10}=\{3,4,5\}\end{gather}

If you count the number of occurences of each individual, you can see that

they are six, i.e. each individual has an equal chanche of being selected (6/10). So each s_i is a random sample according to the second definition. Roughly, it is not an i.i.d. random sample because individuals

are not random variables: you can consistently estimate E[X] by a sample mean but will

never know its exact value, but you *can* know the exact population mean if n=N (let me repeat: roughly.){}^1

Let \mu be some polulation mean (mean height, mean income, …). When n<N

you can estimate \mu like in random variable sampling:

\overline{y}_s=\sum_{i=1}^n y_i,\quad E(\overline{y}_s)=\mu

but the sample mean variance is different:

\text{Var}(\overline{y}_s)=\frac{\tilde\sigma^2}{n}\left(1-\frac{n}{N}\right)

where \tilde\sigma^2 is the population quasi-variance:

\frac{\sum_{i=1}^N(y_i-\overline{y})^2}{N-1}.

Factor (1-n/N) is usally called "finite population correction factor".

This is a quick example of how a (random variable) i.i.d. random sample and a

(finite population) random sample may differ. Statistical

inference is mainly about

random variable sampling, sampling

theory is about finite

population sampling.

{}^1 Say you are manufacturing light bulbs and wish to know their average life

span. Your "population" is just a theoretical or virtual one, at least if you

keep manufacturing light bulbs. So you have to model a *data generation
process* and intepret a set of light bulbs as a (random variable) sample. Say

now that you find a box of 1000 light bulbs and wish to know their average

life span. You can select a small set of light bulbs (a finite population

sample), but you could select all of them. If you select a small sample, this

doesn't transform light bulbs into random variables: the random variable is

generated by you, as the choice between "all" and "a small set" is up to

you. However, when a finite population is very large (say your country

population), when choosing "all" is not viable, the second situation is better

handled as the first one.

**Attribution***Source : Link , Question Author : Silent , Answer Author : whuber*