In Bishop’s

Pattern Recognition and Machine LearningI read the following, just after the probability density p(x\in(a,b))=\int_a^bp(x)\textrm{d}x was introduced:Under a nonlinear change of variable, a probability density transforms

differently from a simple function, due to the Jacobian factor. For

instance, if we consider a change of variables x = g(y), then a

function f(x) becomes \tilde{f}(y) = f(g(y)). Now consider a

probability density p_x(x) that corresponds to a density p_y(y)

with respect to the new variable y, where the sufﬁces denote the

fact that p_x(x) and p_y(y) are different densities. Observations

falling in the range (x, x + \delta x) will, for small values of

\delta x, be transformed into the range (y, y + \delta y) where

p_x(x)\delta x \simeq p_y(y)δy, and hence p_y(y) = p_x(x) |\frac{dx}{dy}| = p_x(g(y)) | g\prime (y) |.What is the Jacobian factor and what exactly does everything mean (maybe qualitatively)? Bishop says, that a consequence of this property is that the concept of the maximum of a probability density is dependent on the choice of variable. What does this mean?

To me this comes all a bit out of the blue (considering it’s in the introduction chapter). I’d appreciate some hints, thanks!

**Answer**

I suggest you reading the solution of Question 1.4 which provides a good intuition.

In a nutshell, if you have an arbitrary function f(x) and two variable x and y which are related to each other by the function x = g(y), then you can find the maximum of the function either by directly analyzing f(x): \hat{x} = argmax_x(f(x)) or the transformed function f(g(y)): \hat{y} = argmax_y(f(g(y)). Not surprisingly, \hat{x} and \hat{y} will be related to each as \hat{x} = g(\hat{y}) (here I assumed that \forall{y}: g^\prime(y)\neq0).

This is not the case for probability distributions. If you have a probability distribution p_x(x) and two random variables which are related to each other by x=g(y). Then there is no direct relation between \hat{x} = argmax_x(p_x(x)) and \hat{y}=argmax_y(p_y(y)). This happens because of Jacobian factor, a factor that shows how the volum is relatively changed by a function such as g(.).

**Attribution***Source : Link , Question Author : ste , Answer Author : MajidL*