# How is causation defined mathematically?

What is the mathematical definition of a causal relationship between two random variables?

Given a sample from the joint distribution of two random variables $$X$$ and $$Y$$, when would we say $$X$$ causes $$Y$$?

What is the mathematical definition of a causal relationship between
two random variables?

Mathematically, a causal model consists of functional relationships between variables. For instance, consider the system of structural equations below:

$$x = f_x(\epsilon_{x})\\ y = f_y(x, \epsilon_{y})$$

This means that $$x$$ functionally determines the value of $$y$$ (if you intervene on $$x$$ this changes the values of $$y$$) but not the other way around. Graphically, this is usually represented by $$x \rightarrow y$$, which means that $$x$$ enters the structural equation of y. As an addendum, you can also express a causal model in terms of joint distributions of counterfactual variables, which is mathematically equivalent to functional models.

Given a sample from the joint distribution of two random variables X
and Y, when would we say X causes Y?

Sometimes (or most of the times) you do not have knowledge about the shape of the structural equations $$f_{x}$$, $$f_y$$, nor even whether $$x\rightarrow y$$ or $$y \rightarrow x$$. The only information you have is the joint probability distribution $$p(y,x)$$ (or samples from this distribution).

This leads to your question: when can I recover the direction of causality just from the data? Or, more precisely, when can I recover whether $$x$$ enters the structural equation of $$y$$ or vice-versa, just from the data?

Of course, without any fundamentally untestable assumptions about the causal model, this is impossible. The problem is that several different causal models can entail the same joint probability distribution of observed variables. The most common example is a causal linear system with gaussian noise.

But under some causal assumptions, this might be possible—and this is what the causal discovery literature works on. If you have no prior exposure to this topic, you might want to start from Elements of Causal Inference by Peters, Janzing and Scholkopf, as well as chapter 2 from Causality by Judea Pearl. We have a topic here on CV for references on causal discovery, but we don’t have that many references listed there yet.

Therefore, there isn’t just one answer to your question, since it depends on the assumptions one makes. The paper you mention cites some examples, such as assuming a linear model with non-gaussian noise. This case is known as LINGAN (short for linear non-gaussian acyclic model), here is an example in R:

library(pcalg)
set.seed(1234)
n <- 500
eps1 <- sign(rnorm(n)) * sqrt(abs(rnorm(n)))
eps2 <- runif(n) - 0.5
x2 <- 3 + eps2
x1 <- 0.9*x2 + 7 + eps1

# runs lingam
X <- cbind(x1, x2)
res <- lingam(X)
as(res, "amat")

# Adjacency Matrix 'amat' (2 x 2) of type ‘pag’:
#     [,1]  [,2]
# [1,] .     .
# [2,]  TRUE .


Notice here we have a linear causal model with non-gaussian noise where $$x_2$$ causes $$x_1$$ and lingam correctly recovers the causal direction. However, notice this depends critically on the LINGAM assumptions.

For the case of the paper you cite, they make this specific assumption (see their “postulate”):

If $$x\rightarrow y$$ , the minimal description length of the mechanism mapping X to Y is independent of the value of X, whereas the minimal description length of the mechanism mapping Y to X is dependent on the value of Y.

Note this is an assumption. This is what we would call their “identification condition”. Essentially, the postulate imposes restrictions on the joint distribution $$p(x,y)$$. That is, the postulate says that if $$x \rightarrow y$$ certain restrictions holds in the data, and if $$y \rightarrow x$$ other restrictions hold. These types of restrictions that have testable implications (impose constraints on $$p(y,x)$$) is what allows one to recover directionally from observational data.

As a final remark, causal discovery results are still very limited, and depend on strong assumptions, be careful when applying these on real world context.