# Sidak or Bonferroni?

I am using a generalized linear model in SPSS to look at the differences in average number of caterpillars (non-normal, using Tweedie distribution) on 16 different species of plants.

I want to run multiple comparisons but I’m not sure if I should use a Sidak or Bonferroni correction test. What is the difference between the two tests? Is one better than the other?

If you run $k$ independent statistical tests using $\alpha$ as your significance level, and the null obtains in every case, whether or not you will find ‘significance’ is simply a draw from a random variable. Specifically, it is taken from a binomial distribution with $p=\alpha$ and $n=k$. For example, if you plan to run 3 tests using $\alpha=.05$, and (unbeknownst to you) there is actually no difference in each case, then there is a 5% chance of finding a significant result in each test. In this way, the type I error rate is held to $\alpha$ for the tests individually, but across the set of 3 tests the long-run type I error rate will be higher. If you believe that it is meaningful to group / think of these 3 tests together, then you may want to hold the type I error rate at $\alpha$ for the set as a whole, rather than just individually. How should you go about this? There are two approaches that center on shifting from the original $\alpha$ (i.e., $\alpha_o$) to a new value (i.e., $\alpha_{\rm new}$):

Bonferroni: adjust the $\alpha$ used to assess ‘significance’ such that

$$\alpha_{\rm new}=\frac{\alpha_{o}}{k}\qquad\qquad\quad$$

Dunn-Sidak: adjust $\alpha$ using

$$\alpha_{\rm new}=1-(1-\alpha_{o})^{1/k}$$

(Note that the Dunn-Sidak assumes all the tests within the set are independent of each other and could yield familywise type I error inflation if that assumption does not hold.)

It is important to note that when conducting tests, there are two kinds of errors that you want to avoid, type I (i.e., saying there is a difference when there isn’t one) and type II (i.e., saying there isn’t a difference when there actually is). Typically, when people discuss this topic, they only discuss—and seem to only be aware of / concerned with—type I errors. In addition, people often neglect to mention that the calculated error rate will only hold if all nulls are true. It is trivially obvious that you cannot make a type I error if the null hypothesis is false, but it is important to hold that fact explicitly in mind when discussing this issue.

I bring this up because there are implications of these facts that appear to often go unconsidered. First, if $k>1$, the Dunn-Sidak approach will offer higher power (although the difference can be quite tiny with small $k$) and so should always be preferred (when applicable). Second, a ‘step-down’ approach should be used. That is, test the biggest effect first; if you are convinced that the null does not obtain in that case, then the maximum possible number of type I errors is $k-1$, so the next test should be adjusted accordingly, and so on. (This often makes people uncomfortable and looks like fishing, but it is not fishing, as the tests are independent, and you intended to conduct them before you ever saw the data. This is just a way of adjusting $\alpha$ optimally.)

The above holds no matter how you you value type I relative to type II errors. However, a-priori there is no reason to believe that type I errors are worse than type II (despite the fact that everyone seems to assume so). Instead, this is a decision that must be made by the researcher, and must be specific to that situation. Personally, if I am running theoretically-suggested, a-priori, orthogonal contrasts, I don’t usually adjust $\alpha$.

(And to state this again, because it’s important, all of the above assumes that the tests are independent. If the contrasts are not independent, such as when several treatments are each being compared to the same control, a different approach than $\alpha$ adjustment, such as Dunnett’s test, should be used.)