If I have two normally distributed independent random variables X and Y with means μX and μY and standard deviations σX and σY and I discover that X+Y=c, then (assuming I have not made any errors) the conditional distribution of X and Y given c are also normally distributed with means

μX|c=μX+(c−μX−μY)σ2Xσ2X+σ2Y μY|c=μY+(c−μX−μY)σ2Yσ2X+σ2Y

and standard deviation

σX|c=σY|c=√σ2Xσ2Yσ2X+σ2Y.It is no surprise that the conditional standard deviations are the same as, given c, if one goes up the other must come down by the same amount. It is interesting that the conditional standard deviation does not depend on c.

What I cannot get my head round are the conditional means, where they take a share of the excess (c−μX−μY) proportional to the original variances, not to the original standard deviations.

For example, if they have zero means, μX=μY=0, and standard deviations σX=3 and σY=1 then conditioned on c=4 we would have E[X|c=4]=3.6 and E[Y|c=4]=0.4, i.e. in the ratio 9:1 even though I would have intuitively thought that the ratio 3:1 would be more natural.

Can anyone give an intuitive explanation for this?This was provoked by a Math.SE question

**Answer**

The question readily reduces to the case μX=μY=0 by looking at X−μX and Y−μY.

Clearly the conditional distributions are Normal. Thus, the mean, median, and mode of each are coincident. The modes will occur at the coordinates of a local maximum of the bivariate PDF of X and Y constrained to the curve g(x,y)=x+y=c. This implies the contour of the bivariate PDF at this location and the constraint curve have parallel tangents. (This is the theory of Lagrange multipliers.) Because the equation of any contour is of the form f(x,y)=x2/(2σ2X)+y2/(2σ2Y)=ρ for some constant ρ (that is, all contours are ellipses), their gradients must be parallel, whence there exists λ such that

(xσ2X,yσ2Y)=∇f(x,y)=λ∇g(x,y)=λ(1,1).

It follows immediately that the *modes* of the conditional distributions (and therefore also the means) are determined by the ratio of the variances, not of the SDs.

This analysis works for correlated X and Y as well and it applies to any linear constraints, not just the sum.

**Attribution***Source : Link , Question Author : Henry , Answer Author : whuber*