What is the difference between the two and why must the level of significance be always higher than or equal to the size of the test?
Answer
Suppose you have a random sample $X_1,\dots,X_n$ from a distribution that involves a parameter $\theta$ which assumes values in a parameter space $\Theta$. You partition the parameter space as $\Theta=\Theta_0\cup\Theta_1$, and you want to test the hypotheses
$$
H_0 : \theta \in \Theta_0 \, ,
$$
$$
H_1 : \theta \in \Theta_1 \, ,
$$
which are called the null and alternative hypotheses, respectively.
Let $\mathscr{X}$ denote the sample space of all possible values of the random vector $X=(X_1,\dots,X_n)$. Your goal in building a test procedure is to partition this sample space $\mathscr{X}$ into two pieces: the critical region $\mathscr{C}$, containing the values of $X$ for which you will reject the null hypothesis $H_0$ (and, so, accept the alternative $H_1$), and the acceptance region $\mathscr{A}$, containing the values of $X$ for which you will not reject the null hypothesis $H_0$ (and, therefore, reject the alternative $H_1$).
Formally, a test procedure can be described as a measurable function $\varphi:\mathscr{X}\to\{0,1\}$, with the obvious interpretation in terms of the decisions made in favor of each of the hypotheses. The critical region is $\mathscr{C}=\varphi^{-1}(\{1\})$, and the acceptance region is $\mathscr{A}=\varphi^{-1}(\{0\})$.
For each test procedure $\varphi$, we define its power function $\pi_\varphi:\Theta\to[0,1]$ by
$$
\pi_\varphi(\theta) = \Pr(\varphi(X)=1\mid\theta) = \Pr(X\in\mathscr{C}\mid\theta) \, .
$$
In words, $\pi_\varphi(\theta)$ gives you the probability of rejecting $H_0$ when the parameter value is $\theta$.
The decision to reject $H_0$ when $\theta\in\Theta_0$ is wrong. So, for a given problem, you may want to consider only those test procedures $\varphi$ for which $\pi_\varphi(\theta)\leq\alpha$, for every $\theta\in\Theta_0$, in which $\alpha$ is some significance level ($0<\alpha<1$). Note that the significance level is a property of a class of test procedures. We can describe this class precisely as
$$
\mathscr{T}_{\alpha} = \left\{ \varphi\in\{0,1\}^\mathscr{X} : \pi_\varphi(\theta)\leq\alpha, \textrm{for every}\; \theta\in\Theta_0\right\} \, .
$$
For each individual test procedure $\varphi$, the maximum probability $\alpha_\varphi=\sup_{\theta\in\Theta_0}\pi_\varphi(\theta)$ of wrongly rejecting $H_0$ is called the size of the test procedure $\varphi$.
It follows directly from these definitions that, once we have established a significance level $\alpha$, and therefore determined the class $\mathscr{T}_{\alpha}$ of acceptable test procedures, each test procedure $\varphi$ within this class will have size $\alpha_\varphi\leq\alpha$, and conversely. Concisely, $\varphi\in\mathscr{T}_{\alpha}$ if and only if $\alpha_\varphi\leq\alpha$.
Attribution
Source : Link , Question Author : Fatsho , Answer Author : Zen