How should I mentally deal with Borel’s paradox?

I feel a bit uneasy with how I’ve mentally dealt with Borel’s paradox and other associated “paradoxes” dealing with conditional probability. For those who are reading this that aren’t familiar with it, see this link. My mental response up to this point has been mostly to ignore it because no one seems to talk about it, but I feel I should rectify this.

We know that this paradox exists, and yet it seems like in practice (as an extreme example, Bayesian analysis) we are perfectly fine with conditioning on events of measure 0; if X is my data, we condition on X=x all the time, even though this is an event of measure 0 when X is continuous. And we certainly make no effort to construct a sequence of events converging to the event we observed to resolve the paradox, at least not explicitly.

I think this is okay because we have essentially fixed the random variable X (in principle) before the experiment, and so we are conditioning on σ(X). That is, σ(X) is the natural σ-algebra to condition on because the information X=x is coming to use through X – if it had come to us in some other fashion, we would condition on a different σ-algebra. Borel’s paradox arises because (I guess) it isn’t obvious what the appropriate σ-algebra to condition on, but the Bayesian has specified σ(X). Because we are specifying a priori that the information X=x came to us by means of measuring X we are in the clear. Once we have specified the σ-algebra, everything is fine; we construct our conditional expectation using Radon-Nikodym and everything is unique up-to null sets.

Is this essentially right, or am I way off? If I’m way off, what is the justification for behaving as we do? [Given the Q&A nature of this site, regard this as my question.] When I took my measure-theoretic probability we, for some reason I don’t understand, never even touched conditional expectation. As a result, I’m worried that my ideas are very confused.


As a Bayesian, I would say Borel’s paradox has nothing (or very little) to do with Bayesian statistics. Except that Bayesian statistics uses conditional distributions, of course. The fact that there is no paradox in defining a posterior distribution as conditional on a set of measure zero {X=x} is that x is not chosen in advance, but as the result of the observation. Thus, if we want to use exotic definitions for the conditional distributions on sets of measure zero, there is zero chance that those sets will contain the x that we will observe in the end. The conditional distribution is defined uniquely almost everywhere and hence almost surely wrt our observation. This is also the meaning of the (great) quote of A. Kolmogorov in the wikipedia entry.

A spot in Bayesian analysis where measure-theoretic subtleties may turn into a paradox is the Savage-Dickey representation of the Bayes factor, since it depends on a specific version of the prior density (as discussed in our paper on the topic…)

Source : Link , Question Author : guy , Answer Author : Xi’an

Leave a Comment