As I understand it, matching is one way to identify causality in observational studies. By matching observations that are “similar” and comparing ones that did or did not receive treatment, you can consider this as a quasi-experiment of sorts.
What is overmatching? What kind of bias does it introduce? I have mostly seen matching from an economics perspective, but have recently seen some papers on epidemiology suggesting that “overmatching” can result in bias. I find it hard to understand the terminology of the papers and would greatly appreciate if someone could help explain some of the main concepts. Below is an article that references the idea:
From Modern Epidemiology 3rd Edition by Rothman, Greenland and Lash:
There are at least three forms of overmatching. The first refers to matching that harms statistical efficiency, such as case-control matching on a variable associated with exposure but not disease. The second refers to matching that harms validity, such as matching on an intermediate between exposure and disease. The third refers to matching that harms cost-efficiency.
The answer from AndyW is about the second form of overmatching. Briefly, here’s how they all work:
1: In order to be a confounder, one of the criteria is that the covariate be associated with both the outcome and the exposure. If it’s only associated with one of them, its not a confounder, and all you’ve succeeded in doing is widening your confidence interval.
To explore this type of overmatching further, consider a matched case-control study of a binary exposure, with one control matched to each case on one or more confounders. Each stratum in the analysis will consist of one case and one control unless some strata can be combined. If the case and its matched control are either both exposed or both unexposed, one margin of the 2 x 2 table will be 0 … such a pair of subjects will not contribute any information to the analysis. If one stratifies on correlates of exposure, one will increase the chance that such tables will occur and thus tend to increase the information lost in stratified analysis.
2: This is partially discussed by AndyW. Matching on an intermediate factor will bias your estimate, as will matching on something affected by both the exposure and outcome. This is essentially controlling on a collider, and any technique that does so will bias your estimate.
If, however, the potential matching factor is affected by exposure and the factor in turn affects disease (i.e., is an intermediate variable), or is affected by both exposure and disease, then matching on the factor will bias both the crude and adjusted effect estimates. In these situations, case-control matching is nothing more than an irreparable form of selection bias.
3: This is more of a study design problem. Extensively matching on variables that you needn’t match on for reasons 1 & 2 can cause you to reject easily obtained controls (friends, family, nearby social network, etc.) in favor of far harder to obtain controls that can be matched on the unnecessary set of covariates. That costs money – money that could have been spent on more subjects, better exposure or disease ascertainment, etc., for no appreciable gain in bias or precision, and indeed having threatened both.