What is the difference between the two and why must the level of significance be always higher than or equal to the size of the test?

**Answer**

Suppose you have a random sample $X_1,\dots,X_n$ from a distribution that involves a parameter $\theta$ which assumes values in a parameter space $\Theta$. You partition the parameter space as $\Theta=\Theta_0\cup\Theta_1$, and you want to test the hypotheses

$$

H_0 : \theta \in \Theta_0 \, ,

$$

$$

H_1 : \theta \in \Theta_1 \, ,

$$

which are called the **null** and **alternative** hypotheses, respectively.

Let $\mathscr{X}$ denote the sample space of all possible values of the random vector $X=(X_1,\dots,X_n)$. Your goal in building a test procedure is to partition this sample space $\mathscr{X}$ into two pieces: the **critical region** $\mathscr{C}$, containing the values of $X$ for which you will reject the null hypothesis $H_0$ (and, so, accept the alternative $H_1$), and the **acceptance region** $\mathscr{A}$, containing the values of $X$ for which you will not reject the null hypothesis $H_0$ (and, therefore, reject the alternative $H_1$).

Formally, a test procedure can be described as a measurable function $\varphi:\mathscr{X}\to\{0,1\}$, with the obvious interpretation in terms of the decisions made in favor of each of the hypotheses. The critical region is $\mathscr{C}=\varphi^{-1}(\{1\})$, and the acceptance region is $\mathscr{A}=\varphi^{-1}(\{0\})$.

For each test procedure $\varphi$, we define its power function $\pi_\varphi:\Theta\to[0,1]$ by

$$

\pi_\varphi(\theta) = \Pr(\varphi(X)=1\mid\theta) = \Pr(X\in\mathscr{C}\mid\theta) \, .

$$

In words, $\pi_\varphi(\theta)$ gives you the probability of rejecting $H_0$ when the parameter value is $\theta$.

The decision to reject $H_0$ when $\theta\in\Theta_0$ is **wrong**. So, for a given problem, you may want to consider only those test procedures $\varphi$ for which $\pi_\varphi(\theta)\leq\alpha$, for every $\theta\in\Theta_0$, in which $\alpha$ is some **significance level** ($0<\alpha<1$). Note that the significance level is a property of a **class** of test procedures. We can describe this class precisely as

$$

\mathscr{T}_{\alpha} = \left\{ \varphi\in\{0,1\}^\mathscr{X} : \pi_\varphi(\theta)\leq\alpha, \textrm{for every}\; \theta\in\Theta_0\right\} \, .

$$

For each **individual** test procedure $\varphi$, the maximum probability $\alpha_\varphi=\sup_{\theta\in\Theta_0}\pi_\varphi(\theta)$ of wrongly rejecting $H_0$ is called the **size** of the test procedure $\varphi$.

It follows directly from these definitions that, once we have established a significance level $\alpha$, and therefore determined the class $\mathscr{T}_{\alpha}$ of acceptable test procedures, each test procedure $\varphi$ within this class will have size $\alpha_\varphi\leq\alpha$, and conversely. Concisely, $\varphi\in\mathscr{T}_{\alpha}$ if and only if $\alpha_\varphi\leq\alpha$.

**Attribution***Source : Link , Question Author : Fatsho , Answer Author : Zen*