# Comparison of statistical tests exploring co-dependence of two binary variables

Suppose we observe data $$(Xi,Yi)i=1,...,n(X_i,Y_i)_{i=1,...,n}$$ on two binary variables: $$X∈{0,1}X\in\{0,1\}$$ and $$Y∈{0,1}Y\in\{0,1\}$$. We would like to test if $$XX$$ and $$YY$$ are co-dependent (related). Standard suggestions in mainstream textbooks are the following:

• chi-square test for independence of $$XX$$ and $$YY$$,
• Z-test for comparing proportions of $$[Y=1][Y = 1]$$ between two groups: $$[X=0][X = 0]$$ and $$[X=1][X = 1]$$,
• Z-test for comparing proportions of $$[X=1][X = 1]$$ between two groups: $$[Y=0][Y = 0]$$ and $$[Y=1][Y = 1]$$.

In addition to that, we can run logistic regressions of $$YY$$ on $$XX$$ and $$XX$$ on $$YY$$. We can check statistical significance of the slope coefficients. There are at least $$33$$ standard tests for that: likelihood ratio, Wald and deviance. Since we consider two regressions, there are $$3∗2=63 * 2 = 6$$ tests added, making the total number $$99$$. But wait, we can run probit models too. Et cetera, et cetera, …

Is there one or more references which systematically and rigorously answer(s) the following questions:

• Which tests are algebraically equivalent and when?
• Which tests are most powerful, why and when?
• What is the power function for each test and each sample size?
• In practical terms, which tests deliver the same verdict almost always?

Tan et al, in Information Systems 29 (2004) 293-313, consider 21 different measures for association patterns between 2 binary variables. Each of these measures has its strengths and weaknesses. As the authors state in the Abstract:

Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. … In this paper, we describe several key properties one should examine in order to select the right measure for a given application. … We show that depending on its properties, each measure is useful for some application, but not for others.

The major issues are in terms of the type of association in which one is interested and the properties of the measure that one wishes to maintain. For example, if P(X) is the probability of X = 1, P(Y) is the probability of Y = 1, and P(X,Y) is the probability that both are 1 in a 2 X 2 contingency table, which of the following matter to you in a measure of association:

• Is the measure 0 when X and Y are statistically independent?
• Does the measure increase with P(X,Y) as P(X) and P(Y) are constant?
• Does the measure decrease monotonically in either P(X) or P(Y) as the other probabilities remain constant?
• Is the measure symmetric under permutation of X and Y?
• Is it invariant to row and column scaling?
• Is it antisymmetric under row or column permutation?
• Is it invariant when both rows and columns are swapped?
• Is it invariant when extra cases in which both X and Y are 0 are
If your measure of codependence between X and Y is the type examined by a $$χ2\chi^2$$ test of a 2 x 2 contingency table, then this answer also answers your question with respect to logistic regression and chi-square tests. Briefly:
• Asymptotically, all the logistic regression tests are equivalent. The likelihood-ratio test is based on deviance so they’re not 2 separate tests. The score test for a logistic regression (not mentioned in your question) is exactly equivalent to a $$χ2\chi^2$$ test without continuity correction. With large numbers of cases, when the underlying assumptions of normality hold, they should all provide the same results.