Comparison of statistical tests exploring co-dependence of two binary variables

Suppose we observe data (Xi,Yi)i=1,...,n on two binary variables: X{0,1} and Y{0,1}. We would like to test if X and Y are co-dependent (related). Standard suggestions in mainstream textbooks are the following:

  • chi-square test for independence of X and Y,
  • Z-test for comparing proportions of [Y=1] between two groups: [X=0] and [X=1],
  • Z-test for comparing proportions of [X=1] between two groups: [Y=0] and [Y=1].

In addition to that, we can run logistic regressions of Y on X and X on Y. We can check statistical significance of the slope coefficients. There are at least 3 standard tests for that: likelihood ratio, Wald and deviance. Since we consider two regressions, there are 32=6 tests added, making the total number 9. But wait, we can run probit models too. Et cetera, et cetera, …

Is there one or more references which systematically and rigorously answer(s) the following questions:

  • Which tests are algebraically equivalent and when?
  • Which tests are most powerful, why and when?
  • What is the power function for each test and each sample size?
  • In practical terms, which tests deliver the same verdict almost always?


Tan et al, in Information Systems 29 (2004) 293-313, consider 21 different measures for association patterns between 2 binary variables. Each of these measures has its strengths and weaknesses. As the authors state in the Abstract:

Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. … In this paper, we describe several key properties one should examine in order to select the right measure for a given application. … We show that depending on its properties, each measure is useful for some application, but not for others.

The major issues are in terms of the type of association in which one is interested and the properties of the measure that one wishes to maintain. For example, if P(X) is the probability of X = 1, P(Y) is the probability of Y = 1, and P(X,Y) is the probability that both are 1 in a 2 X 2 contingency table, which of the following matter to you in a measure of association:

  • Is the measure 0 when X and Y are statistically independent?
  • Does the measure increase with P(X,Y) as P(X) and P(Y) are constant?
  • Does the measure decrease monotonically in either P(X) or P(Y) as the other probabilities remain constant?
  • Is the measure symmetric under permutation of X and Y?
  • Is it invariant to row and column scaling?
  • Is it antisymmetric under row or column permutation?
  • Is it invariant when both rows and columns are swapped?
  • Is it invariant when extra cases in which both X and Y are 0 are

No measure has all of these properties. The issue of which type of measure makes the most sense in a particular application thus would seem to be more crucial than generic considerations of power; the “verdict” might well depend on the type of association of interest.

If your measure of codependence between X and Y is the type examined by a χ2 test of a 2 x 2 contingency table, then this answer also answers your question with respect to logistic regression and chi-square tests. Briefly:

  • Asymptotically, all the logistic regression tests are equivalent. The likelihood-ratio test is based on deviance so they’re not 2 separate tests. The score test for a logistic regression (not mentioned in your question) is exactly equivalent to a χ2 test without continuity correction. With large numbers of cases, when the underlying assumptions of normality hold, they should all provide the same results.

  • For a logistic regression model, the likelihood-ratio test is generally preferred. The Wald test assumes a symmetric, normal distribution of the log-likelihood profile around the maximum-likelihood estimate that might not be found in a small sample. The score test in general is less powerful in practice. (I haven’t worked through the details specific to a 2-way contingency table, however.) Power functions would typically be calculated based on the assumptions underlying the tests (effectively normality assumptions, as also underly the separate F-tests you note), which suggests to me that they would be the same under those assumptions. In practical applications involving small numbers of cases, such theoretical power functions might be misleading.

In answering this question, I assumed that the observations on variables X and Y are not paired. If they are paired then see the answer from @Alexis.

Source : Link , Question Author : stans , Answer Author : EdM

Leave a Comment