What’s the distribution of (a−d)2+4bc(a-d)^2+4bc, where a,b,c,da,b,c,d are uniform distributions?

I have four independent uniformly distributed variables a,b,c,d, each in
[0,1]. I want to calculate the distribution of (ad)2+4bc. I computed the distribution of u2=4bc to be f2(u2)=14lnu24 (hence u2(0,4]), and of u1=(ad)2 to be f1(u1)=1u1u1. Now, the distribution of a sum u1+u2 is (u1,u2 are also independent) fu1+u2(x)=+f1(xy)f2(y)dy=14401xyxylny4dy, because y(0,4]. Here, it has to be x>y so the integral is equal to fu1+u2(x)=14x01xyxylny4dy. Now I insert it to Mathematica and get that fu1+u2(x)=14[x+xlnx42x(2+lnx)].

I made four independent sets a,b,c,d consisting of 106 numbers each and drew a histogram of (ad)2+4bc:

enter image description here

and drew a plot of fu1+u2(x):

enter image description here

Generally, the plot is similar to the histogram, but on the interval (0,5) most of it is negative (the root is at 2.27034). And the integral of the positive part is 0.77.

Where’s the mistake? Or where am I missing something?

EDIT: I scaled the histogram to show the PDF.

enter image description here

EDIT 2: I think I know where’s the problem in my reasoning – in the integration limits. Because y(0,4] and xy(0,1], I cannot simply x0. The plot shows the region I have to integrate in:

enter image description here

This means I have x0 for y(0,1] (that’s why part of my f was correct), xx1 in y(1,4], and 4x1 in y(4,5]. Unfortunately, Mathematica fails to compute the latter two integrals (well, it does calculate the second, by there’s an imaginary unit in the output that spoils everything…).

EDIT 3: It appears that Mathematica CAN compute the last three integrals with the following code:

(1/4)*Integrate[((1-Sqrt[u1-u2])*Log[4/u2])/Sqrt[u1-u2],{u2,0,u1},
Assumptions ->0 <= u2 <= u1 && u1 > 0]

(1/4)*Integrate[((1-Sqrt[u1-u2])*Log[4/u2])/Sqrt[u1-u2],{u2,u1-1,u1},
Assumptions -> 1 <= u2 <= 3 && u1 > 0]

(1/4)*Integrate[((1-Sqrt[u1-u2])*Log[4/u2])/Sqrt[u1-u2],{u2,u1-1,4},
Assumptions -> 4 <= u2 <= 4 && u1 > 0]

which gives a correct answer 🙂

Answer

Often it helps to use cumulative distribution functions.

First,

F(x)=Pr

Next,

G(y) = \Pr(4 b c \le y) = \Pr(b c \le \frac{y}{4}) = \int_0^{y/4} dt + \int_{y/4}^1\frac{y\,dt}{4t} = \frac{y}{4}\left(1 – \log\left(\frac{y}{4}\right)\right).

Let \delta range between the smallest (0) and largest (5) possible values of (a-d)^2 + 4 b c. Writing x=(a-d)^2 with CDF F and y=4 b c with PDF g = G^\prime, we need to compute

H(\delta) = \Pr((a-d)^2 + 4 b c \le \delta) = \Pr(x\le \delta-y) = \int_0^4 F(\delta-y)g(y)dy.

We can expect this to be nasty–the uniform distribution PDF is discontinuous and thus ought to produce breaks in the definition of H–so it is somewhat amazing that Mathematica obtains a closed form (which I will not reproduce here). Differentiating it with respect to \delta gives the desired density. It is defined piecewise within three intervals. In 0 \lt \delta \lt 1,

H^\prime(\delta) = h(\delta) = \frac{1}{8} \left(8 \sqrt{\delta }+\delta (-(2+\log (16)))+2 \left(\delta -2 \sqrt{\delta }\right) \log (\delta )\right).

In 1 \lt \delta \lt 4,

h(\delta) = \frac{1}{4} \left(-(\delta +1) \log (\delta -1)+\delta \log (\delta )-4 \sqrt{\delta } \coth ^{-1}\left(\sqrt{\delta }\right)+3+\log (4)\right).

And in 4 \lt \delta \lt 5,

\eqalign{
&h(\delta) = \\
&\frac{1}{4}\left(\delta -4 \sqrt{\delta -4}+(\delta +1) \log \left(\frac{4}{\delta -1}\right)+4 \sqrt{\delta } \tanh ^{-1}\left(\frac{\sqrt{(\delta -4) \delta }-\sqrt{\delta }}{\delta -\sqrt{\delta -4}}\right)-1\right).
}

Figure

This figure overlays a plot of h on a histogram of 10^6 iid realizations of (a-d)^2 + 4bc. The two are almost indistinguishable, suggesting the correctness of the formula for h.


The following is a nearly mindless, brute-force Mathematica solution. It automates practically everything about the calculation. For instance, it will even compute the range of the resulting variable:

ClearAll[ a, b, c, d, ff, gg, hh, g, h, x, y, z, zMin, zMax, assumptions];
assumptions = 0 <= a <= 1 && 0 <= b <= 1 && 0 <= c <= 1 && 0 <= d <= 1; 
zMax = First@Maximize[{(a - d)^2 + 4 b c, assumptions}, {a, b, c, d}];
zMin = First@Minimize[{(a - d)^2 + 4 b c, assumptions}, {a, b, c, d}];

Here is all the integration and differentiation. (Be patient; computing H takes a couple of minutes.)

ff[x_] := Evaluate@FullSimplify@Integrate[Boole[(a - d)^2 <= x], {a, 0, 1}, {d, 0, 1}];
gg[y_] := Evaluate@FullSimplify@Integrate[Boole[4 b c <= y], {b, 0, 1}, {c, 0, 1}];
g[y_]  := Evaluate@FullSimplify@D[gg[y], y];
hh[z_] := Evaluate@FullSimplify@Integrate[ff[-y + z] g[y], {y, 0, 4}, 
          Assumptions -> zMin <= z <= zMax];
h[z_]  :=  Evaluate@FullSimplify@D[hh[z], z];

Finally, a simulation and comparison to the graph of h:

x = RandomReal[{0, 1}, {4, 10^6}];
x = (x[[1, All]] - x[[4, All]])^2 + 4 x[[2, All]] x[[3, All]];
Show[Histogram[x, {.1}, "PDF"], 
 Plot[h[z], {z, zMin, zMax}, Exclusions -> {1, 4}], 
 AxesLabel -> {"\[Delta]", "Density"}, BaseStyle -> Medium, 
 Ticks -> {{{0, "0"}, {1, "1"}, {4, "4"}, {5, "5"}}, Automatic}]

Attribution
Source : Link , Question Author : corey979 , Answer Author : Community

Leave a Comment