Suppose that X, Y, and Z are random variables. X and Y are positively correlated and Y and Z are likewise positively correlated. Does it follow that X and Z must be positively correlated?
We may prove that if the correlations are sufficiently close to 1, then X and Z must be positively correlated.
Let’s assume C(x,y) is the correlation coefficient between x and y. Like wise we have C(x,z) and C(y,z). Here is an equation which comes from solving correlation equation mathematically :
C(x,y) = C(y,z) * C(z,x) – Square Root ( (1 – C(y,z)^2 ) * (1 – C(z,x)^2 ) )
Now if we want C(x,y) to be more than zero , we basically want the RHS of above equation to be positive. Hence, you need to solve for :
C(y,z) * C(z,x) > Square Root ( (1 – C(y,z)^2 ) * (1 – C(z,x)^2 ) )
We can actually solve the above equation for both C(y,z) > 0 and C(y,z) < 0 together by squaring both sides. This will finally give the result as C(x,y) is a non zero number if following equation holds true:
C(y,z) ^ 2 + C(z,x) ^ 2 > 1
Wow, this is an equation for a circle. Hence the following plot will explain everything :
If the two known correlation are in the A zone, the third correlation will be positive. If they lie in the B zone, the third correlation will be negative. Inside the circle, we cannot say anything about the relationship. A very interesting insight here is that even if C(y,z) and C(z,x) are 0.5, C(x,y) can actually also be negative.