# Size of a test and level of significance

What is the difference between the two and why must the level of significance be always higher than or equal to the size of the test?

Suppose you have a random sample $$X_1,\dots,X_n$$ from a distribution that involves a parameter $$\theta$$ which assumes values in a parameter space $$\Theta$$. You partition the parameter space as $$\Theta=\Theta_0\cup\Theta_1$$, and you want to test the hypotheses
$$H_0 : \theta \in \Theta_0 \, ,$$
$$H_1 : \theta \in \Theta_1 \, ,$$
which are called the null and alternative hypotheses, respectively.

Let $$\mathscr{X}$$ denote the sample space of all possible values of the random vector $$X=(X_1,\dots,X_n)$$. Your goal in building a test procedure is to partition this sample space $$\mathscr{X}$$ into two pieces: the critical region $$\mathscr{C}$$, containing the values of $$X$$ for which you will reject the null hypothesis $$H_0$$ (and, so, accept the alternative $$H_1$$), and the acceptance region $$\mathscr{A}$$, containing the values of $$X$$ for which you will not reject the null hypothesis $$H_0$$ (and, therefore, reject the alternative $$H_1$$).

Formally, a test procedure can be described as a measurable function $$\varphi:\mathscr{X}\to\{0,1\}$$, with the obvious interpretation in terms of the decisions made in favor of each of the hypotheses. The critical region is $$\mathscr{C}=\varphi^{-1}(\{1\})$$, and the acceptance region is $$\mathscr{A}=\varphi^{-1}(\{0\})$$.

For each test procedure $$\varphi$$, we define its power function $$\pi_\varphi:\Theta\to[0,1]$$ by
$$\pi_\varphi(\theta) = \Pr(\varphi(X)=1\mid\theta) = \Pr(X\in\mathscr{C}\mid\theta) \, .$$
In words, $$\pi_\varphi(\theta)$$ gives you the probability of rejecting $$H_0$$ when the parameter value is $$\theta$$.

The decision to reject $$H_0$$ when $$\theta\in\Theta_0$$ is wrong. So, for a given problem, you may want to consider only those test procedures $$\varphi$$ for which $$\pi_\varphi(\theta)\leq\alpha$$, for every $$\theta\in\Theta_0$$, in which $$\alpha$$ is some significance level ($$0<\alpha<1$$). Note that the significance level is a property of a class of test procedures. We can describe this class precisely as
$$\mathscr{T}_{\alpha} = \left\{ \varphi\in\{0,1\}^\mathscr{X} : \pi_\varphi(\theta)\leq\alpha, \textrm{for every}\; \theta\in\Theta_0\right\} \, .$$

For each individual test procedure $$\varphi$$, the maximum probability $$\alpha_\varphi=\sup_{\theta\in\Theta_0}\pi_\varphi(\theta)$$ of wrongly rejecting $$H_0$$ is called the size of the test procedure $$\varphi$$.

It follows directly from these definitions that, once we have established a significance level $$\alpha$$, and therefore determined the class $$\mathscr{T}_{\alpha}$$ of acceptable test procedures, each test procedure $$\varphi$$ within this class will have size $$\alpha_\varphi\leq\alpha$$, and conversely. Concisely, $$\varphi\in\mathscr{T}_{\alpha}$$ if and only if $$\alpha_\varphi\leq\alpha$$.