# Law of total expecation/tower rule: Why must both random variables come from the same probability space?

I quote (emphasis mine) from the wikipedia definition:

The proposition in probability theory known as the law of total expectation, …, states that if X is an integrable random variable (i.e., a random variable satisfying E( | X | ) < ∞) and Y is any random variable, not necessarily integrable, on the same probability space, then
$$\operatorname{E}(X) = \operatorname{E} ( \operatorname{E} ( X \mid Y))$$

I don’t understand what they mean by the same probability space, and do not know why this is an important part of the definition. Take the example further down on the page:

Suppose that two factories supply light bulbs to the market. Factory
X’s bulbs work for an average of 5000 hours, whereas factory Y’s bulbs
work for an average of 4000 hours. It is known that factory X supplies
60% of the total bulbs available. What is the expected length of time
that a purchased bulb will work for?

The random variables here seem to be:

1. The amount of time a light bulb lasts for.
2. Which factory a light bulb comes from.

How can these two have the same probability space?

I don’t understand what they mean by the same probability space

That’s the problem.

The standard way to think of the objects of probability theory (random variables, distributions, etc.) is through Kolmogorov’s axioms. These axioms are framed in the language of measure theory, but it’s quite possible to understand simple cases without any measure theory.

Basically, a probability model consists of three things: a set $\Omega$, whose individual elements you can think of as summarising the “true state of the world” (or at least all you need to know about it); a collection $\mathcal{F}$ of subsets of $\Omega$ (whose elements are the possible events whose probability you may need to measure); and a probability measure $P$, which is a function that takes an event $E \in \mathcal{F}$ and spits out a number $P(E) \in [0, 1]$ (whose interpretation is the probability that event $E$ occurs). The triple $(\Omega, \mathcal{F}, P)$ is known as a probability space as long as it satisfies certain natural properties (for instance, ththe probability of a union of countably many disjoint events is the sum of their probabilities).

In this framework, a random variable $X$ is a function from $\Omega$ to $\mathbb{R}$. In your example, we have two random variables: $T$ (the amount of time a light bulb lasts) and $F$ (which factory a light bulb comes from).

How can these two have the same probability space?

The question now amounts to: how do we define a probability space $(\Omega, \mathcal{F}, P)$ and functions $T, F : \Omega \to \mathbb{R}$ in such a way as to model the problem under consideration. There are many ways, but a simple one is to let $\Omega = \{ (f, t) : f = 0, 1, t > 0 \}$. An element $(f, t) \in \Omega$ specifies a particular (non-random) light bulb from factory $f$ that will last for time $t$. Then we would define $T(f, t) = t$ and $F(f, t) = f$. The joint distribution of $(T, F)$ is then defined by specifying $\mathcal{F}$ and $P$.

I don’t understand… why this is an important part of the definition

The conditional expectation $E[X \mid Y]$ of a random variable $X$ given another random variable $Y$ is itself defined to be a type of random variable satsifying certain properties. You can find the formal definition here, however it may seem quite arcane if you are not familiar with measure-theoretic probability. Basically, this definition doesn’t make sense if $X$ and $Y$ are not defined on the same probability space. Ultimately, though, it is usually not problematic to define two random variables on a common probability space, so this condition amounts to a technicality.