Relationships between correlation and causation

From the Wikipedia page titled correlation does not imply causality,

For any two correlated events, A and B, the different possible relationships include:

  1. A causes B (direct causation);
  2. B causes A (reverse causation);
  3. A and B are consequences of a common cause, but do not cause each
  4. A and B both causes C, which is (explicitly or implicitly)
    conditioned on.;
  5. A causes B and B causes A (bidirectional or cyclic causation);
  6. A causes C which causes B (indirect causation);
  7. There is no connection between A and B; the correlation is a

What does the fourth point mean. A and B both causes C, which is (explicitly or implicitly) conditioned on. If A and B cause C, why do A and B have to be correlated.


“Conditioning” is a word from probability theory :

Conditioning on C means that we are only looking at cases where C is true. “Implicitly” means that we may not be making this restriction explicit, sometimes not even aware of doing it.

The point means that, when A and B both cause C, observing a correlation between A and B in cases where C is true, does not mean there is a real relationship between A and B. It’s just conditioning on C (maybe unwillingly) that creates an artificial correlation.

Let’s take an example.

In a country there exists exactly two sorts of diseases, perfectly independent. Call A : “person has first disease”, B : “person has second disease”. Assume P(A)=0.1, P(B)=0.1.

Now any person who has one of these diseases goes to see the doctor and only then. Call C : “person goes to see the doctor”. We have C=A or B.

Now let’s calculate a few probabilities :

  • P(C)=0.19
  • P(A|C)=P(B|C)=
  • P(A and B|C)=
  • P(A|C)P(B|C)0.28

Clearly, when conditioned on C, A and B are very far from being independent. Actually, conditioned on C, notA seems to “cause” B.

If you use the list of persons who where recorded by their doctor(s) as a data source for an analysis, then there seems to be a strong correlation between diseases A and B. You may not be aware of the fact that your data source is actually a conditioning. This is also called a “selection bias”.

Source : Link , Question Author : matt , Answer Author : kjetil b halvorsen

Leave a Comment