I’d like to see an extension of this discussion of the age-old chi-sq vs. Fisher’s exact test debate, broadening the scope a bit. There are many many tests for interactions in a contingency table, enough to make my head spin. I’m hoping to get an explanation of what test I should use and when, and of course an explanation as to why one test should be preferred over another.
My current problem is the classic n×m case, but answers regarding higher dimensionality are welcome, as are tips for implementing the various solutions in R, at least, in cases where it is non-obvious how to proceed.
Below I’ve listed all the tests I’m aware of; I hope by exposing my errors they can be corrected.
χ2. The old standby. There are three major options here:
- The correction built into R for 2×2 tables: “one half is subtracted from all |O−E| differences.” Should I always be doing this?
- “N−1” χ2 Test, not sure how to do this in R.
- Monte Carlo simulation. Is this always best? Why does R not give me df when I do this?
- Traditionally advised when any cell is expected to be <4, but apparently some dispute this advice.
- Is the (usually false) assumption that the marginals are fixed really the biggest problem with this test?
- Another exact test, except I’ve never heard of it.
- One thing that always confuses me about glms is exactly how to do this significance tests so help on that would be appreciated. Is it best to do nested model comparison? What about a Wald test for a particular predictor?
- Should I really just always be doing Poisson regression? What’s the practical difference between this and a χ2 test?
This is a good question, but a big one. I don’t think I can provide a complete answer, but I will throw out some food for thought.
First, under your top bullet point, the correction you are referring to is known as Yates’ correction for continuity. The problem is that we calculate a discrete inferential statistic:
(It is discrete because, with only a finite number of instances represented in a contingency table, there are a finite number of possible realized values that this statistic can take on.) Notwithstanding this fact, it is compared to a continuous reference distribution (viz., the χ2 distribution with degrees of freedom (r−1)(c−1)). This necessarily leads to a mismatch on some level. With a particularly small data set, and if some cells have expected values less than 5, it is possible that the p-value could be too small. Yates’ correction adjusts for this.
Ironically, the same underlying problem (discrete-continuous mismatch) can lead to p-values that are too high. Specifically, the p-value is conventionally defined as the probability of getting data that are as extreme or more than the observed data. With continuous data, it is understood that the probability of getting any exact value is vanishingly small, and thus we really have the probability of data that are more extreme. However, with discrete data there is a finite probability of getting data just like yours. Only calculating the probability of getting data more extreme than yours yields nominal p-values that are too low (leading to increased type I errors), but including the probability of getting data the same as yours leads to nominal p-values that are too high (which would lead to increased type II errors). These facts prompt the idea of the mid p-value. Under this approach, the p-value is the probability of data more extreme than yours plus half the probability of data just the same as yours.
As you point out, there are many possibilities for testing contingency table data. The most comprehensive treatment of the pros and cons of the various approaches is here. That paper is specific to 2×2 tables, but you can still learn a lot about the options for contingency table data by reading it.
I also do think it’s worth considering models seriously. Older tests like chi-squared are quick, easy, and understood by many people, but do not leave you with as comprehensive an understanding of your data as you get from building an appropriate model. If it is reasonable to think of the rows [columns] of your contingency table as a response variable, and the columns [rows] as an explanatory / predictor variables, a modeling approach follows quite readily. For instance, if you had just two rows, you can build a logistic regression model; if there are several columns, you could use reference cell coding (dummy coding) to build an ANOVA-type model. On the other hand, if you have more than two rows, multinomial logistic regression can be used in the same manner. Should your rows have an intrinsic order, ordinal logistic regression would yield superior performance to multinomial. The log-linear model (Poisson regression) is probably less relevant unless you have contingency tables with more than two dimensions, in my opinion.
For a comprehensive treatment of topics like these, the best sources are the books by Agresti: either his full-scale treatment (more rigorous), his intro book (easier but still comprehensive and very good), or possibly also his ordinal book.
Update: Just for the sake of the completeness of list of possible tests, it occurs to me that we can add the likelihood ratio test (often called the ‘G2-test‘). It is:
This is also distributed as a chi-squared, and will almost always yield the same decision. The realized values of the two statistics will typically be similar, but slightly different. The question of which will be more powerful in a given situation is quite subtle. I gather it is the default choice by tradition in some fields. I do not necessarily advocate it’s use over the traditional test; I’m only listing it for completeness, as I say.