# Confounder – definition

According to M. Katz in his book Multivariable analysis (Section 1.2, page 6), “A confounder is associated with the risk factor and causally related to the outcome.” Why must the confounder be causally related to the outcome? Would it be enough for the confounder to be associated with the outcome?

Why must the confounder be causally related to the outcome? Would it be enough for the confounder to be associated with the
outcome?

No, it’s not enough.

Let’s start with the case where you can have a variable which is both associated with the outcome and the treatment, but controlling for it would bias your estimate.

For example, consider the following causal graph, taken from Pearl, where $Z$ is a pre-treatment collider:

In this case, there’s no confounding, you can estimate the effect of X on Y directly.

Notice, however, that Z is associated both with the treatment and with the outcome. But it’s still not a confounder. In fact, if you control for Z in this case you would bias your estimate. This situation is called M-bias (because of the graph structure).

Another similar, more straightforward, case where you should not control is when the variable is a result both of the treatment $X$ and of the outcome $Y$. Take this simple collider graph:

Here, again, Z is associated with X and Y, but it’s not a cofounder. You should not control for it.

Now, it’s worth noticing that even if a variable is causally related to the outcome, it’s also not necessarily a confounder.

Let’s take the case of mediators, in the simple graph below:

If you want to measure the total effect of D on Y, you should not control for things that mediate the effect — in this case M. That is, M is causally related to Y, yet it’s not a confounder with respect to the total effect of D on Y either.

Notice however, that defining confouding is much easier than defining what a confounder is. For a more strict discussion of the definition of confouder, you might want to read this paper by VanderWeele and Shpitser.

Why is this the case? Because the primary concept here is that of the confounding itself, not of confounder. For you research question, you should ask yourself “how can I eliminate confounding?” instead of “is this variable a confounder?”.

And as a final note, it’s worth mentioning that these misconceptions are still widespread. Just to illustrate, take this citation from a 2016 paper:

Causal inference in the absence of a randomized experiment or strong
quasi-experimental design requires appropriately conditioning on all
pre-treatment variables that predict both treatment and outcome, also
known as confounding covariates.

As we have shown in the previous examples, this is incorrect. Confounders are not “all pre-treatment variables that predict both treatment and outcome”. Controlling for all them might not be necessary for eliminating confounding or it could even bias your results. Pearl has a very good overview on confounding here.