# Understanding d-separation theory in causal Bayesian networks

I am trying to understand the d-Separation logic in Causal Bayesian Networks. I know how the algorithm works, but I don’t exactly understand why the “flow of information” works as stated in the algorithm.

For example in the graph above, lets think that we are only given X and no other variable has been observed. Then according to the rules of d-separation, the information flow from X to D:

1. X influences A, which is $P(A)\neq P(A|X)$. This is OK, since A causes X and if we know about the effect X, this affects our belief about the cause A. Information flows.

2. X influences B,which is $P(B)\neq P(B|X)$. This is OK, since A has been changed by our knowledge about X, the change at A can influence our beliefs about its cause, B, as well.

3. X influences C,which is $P(C)\neq P(C|X)$. This is OK because we know that B is biased by our knowledge about its indirect effect, X, and since B is biased by X, this will influence B’s all direct and indirect effects. C is a direct effect of B and it is influenced by our knowledge about X.

Well, up to this point, everything is OK for me since the flow of the information occurs according to intuitive cause-effect relationships. But I don’t get the special behavior of so called “V-structures” or “Colliders” in this scheme. According to the d-Separation theory, B and D are the common causes of C in the graph above and it says that if we did not observe C or any of its descendants, the flow information from X is blocked at C. Well, OK, but my question is why?

From the three steps above, started from X, we saw that C is influenced by our knowledge about X and the information flow occurred according to the cause-effect relationship. The d-Separation theory says that we cannot go from C to D since C is not observed. But I think that since we know that C is biased and D is a cause of C, D should be affected as well while the theory says the opposite. I am clearly missing something in my thinking pattern but can’t see what it is.

So I need an explanation of why the flow of information blocked at C, if C is not observed.