Different probability density transformations due to Jacobian factor

In Bishop’s Pattern Recognition and Machine Learning I read the following, just after the probability density p(x\in(a,b))=\int_a^bp(x)\textrm{d}x was introduced:

Under a nonlinear change of variable, a probability density transforms
differently from a simple function, due to the Jacobian factor. For
instance, if we consider a change of variables x = g(y), then a
function f(x) becomes \tilde{f}(y) = f(g(y)). Now consider a
probability density p_x(x) that corresponds to a density p_y(y)
with respect to the new variable y, where the suffices denote the
fact that p_x(x) and p_y(y) are different densities. Observations
falling in the range (x, x + \delta x) will, for small values of
\delta x, be transformed into the range (y, y + \delta y) where
p_x(x)\delta x \simeq p_y(y)δy, and hence p_y(y) = p_x(x) |\frac{dx}{dy}| = p_x(g(y)) | g\prime (y) |.

What is the Jacobian factor and what exactly does everything mean (maybe qualitatively)? Bishop says, that a consequence of this property is that the concept of the maximum of a probability density is dependent on the choice of variable. What does this mean?

To me this comes all a bit out of the blue (considering it’s in the introduction chapter). I’d appreciate some hints, thanks!


I suggest you reading the solution of Question 1.4 which provides a good intuition.

In a nutshell, if you have an arbitrary function f(x) and two variable x and y which are related to each other by the function x = g(y), then you can find the maximum of the function either by directly analyzing f(x): \hat{x} = argmax_x(f(x)) or the transformed function f(g(y)): \hat{y} = argmax_y(f(g(y)). Not surprisingly, \hat{x} and \hat{y} will be related to each as \hat{x} = g(\hat{y}) (here I assumed that \forall{y}: g^\prime(y)\neq0).

This is not the case for probability distributions. If you have a probability distribution p_x(x) and two random variables which are related to each other by x=g(y). Then there is no direct relation between \hat{x} = argmax_x(p_x(x)) and \hat{y}=argmax_y(p_y(y)). This happens because of Jacobian factor, a factor that shows how the volum is relatively changed by a function such as g(.).

Source : Link , Question Author : ste , Answer Author : MajidL

Leave a Comment