# Why don’t people trade significance level for power?

As a convention, we have a lot of studies whose significance level is $$0.050.05$$ and a power of $$0.80.8$$. However, it is extremely rare to find a study whose $$α=0.2\alpha = 0.2$$ with a power of $$0.950.95$$.

From my understanding, after an experiment has been conducted, the significance level doesn’t matter at all if the result is non-significant, because in this case, we are considering whether it makes sense to accept the null, and all we care about is the power. Similarly, if the result is significant, then the significance level becomes your evidence, while the power of the test makes absolutely zero difference. (By “doesn’t matter”, I mean “doesn’t for the purpose of this experiment”. Both significance level and power should be important for meta-studies, so please report both in your publication!)

If I’m correct, then the null and the alternative are to some extent symmetrical: the null hypothesis doesn’t inherently require more protection. If you want to prove the alternative, say “this new drug has an effect on the patients”, then use a very small $$α\alpha$$ and moderately high power. On the other hand, when you want to prove the null, for example in a normality test, then you should choose a moderately small $$α\alpha$$ and very high power, so that you can confidentially accept the null.

Why are experiments with moderately small $$α\alpha$$ and very high power so rare?

Why are experiments with moderately small $$α\alpha$$ and very high power so rare?

This is all a bit relative, but one could certainly argue that the significance level $$α=0.05\alpha = 0.05$$ is already weak, and already constitutes a sacrifice made for higher power (e.g., relative to the significance level $$α=0.01\alpha = 0.01$$ or other lower significance levels). While opinions on this will differ, my own view is that this is already a very weak significance level, so choosing it at all is already a trade-off to get higher power.

From my understanding, after an experiment has been conducted, the significance level doesn’t matter at all if the result is non-significant, because in this case, we are considering whether it makes sense to accept the null, and all we care about is the power. Similarly, if the result is significant, then the significance level becomes your evidence, while the power of the test makes absolutely zero difference.

I can see why you might think this, but it is not really true. In classical hypothesis testing there is quite a complex and subtle interaction in these things. Remember that both the p-value and the power pertain to probabilities that condition on the true state of the hypotheses (the p-value conditions on the null, and the power conditions on the alternative). When you get your result from the data, you make an inference about the hypotheses, but you still don’t know their true state. Thus, it is not really legitimate to say that you can completely ignore the “other half” of the test. Regardless of whether the result is statistically significant or not, the interpretation of that result is made holistically, with respect to all the properties of the test.

It is also worth noting that, for a fixed model and test, and a fixed sample size, the power function is a function of the chosen significance level. The chosen significance level determines the rejection region, which directly affects the power of the test. So again, there is a relationship between these things, and you cannot ignore “one half” of the properties of the test.

Finally, it is also important to note that practitioners conducting a classical statistical tests will often just report the p-value of the test and leave it to the reader to choose their own significance level if a binary decision is required. (That is my preferred approach unless there is a specific need to make an immediate binary conclusion.) Modern statistical literature cautions strongly against reducing reported outcomes of hypothesis tests to a binary without also giving the underlying p-value. So in many practical cases, the significance level is not chosen prior to the analysis, and might not be chosen by the analyst conducting the test at all.