Why does this set of data have no covariance?

My understanding of how covariance works is that data that are correlated should have a somewhat high covariance. I’ve come across a situation where my data looks correlated (as shown in the scatter plot) but the covariance is near-zero. How can the covariance of the data be zero if they are correlated?

import numpy as np
x1 = np.array([ 0.03551153,  0.01656052,  0.03344669,  0.02551755,  0.02344788,
        0.02904475,  0.03334179,  0.02683399,  0.02966126,  0.03947681,
        0.02537157,  0.03015175,  0.02206443,  0.03590149,  0.03702152,
        0.02697212,  0.03777607,  0.02468797,  0.03489873,  0.02167536])
x2 = np.array([ 0.0372599 ,  0.02398212,  0.03649548,  0.03145494,  0.02925334,
        0.03328783,  0.03638871,  0.03196318,  0.03347346,  0.03874528,
        0.03098697,  0.03357531,  0.02808358,  0.03747998,  0.03804655,
        0.03213286,  0.03827639,  0.02999955,  0.0371424 ,  0.0279254 ])
print np.cov(x1, x2)

array([[  3.95773132e-05,   2.59159589e-05],
       [  2.59159589e-05,   1.72006225e-05]])

enter image description here

Answer

The magnitude of covariance depends on the magnitude of the data and how close those data points are scattered around the mean of that data. It’s easy to see when you look at the formula:

covx,y=(xiˉx)(yiˉy)n1

In your case, the deviance of the x1 and x2 data points to the mean of x1 and x2 are:

x1-mean(x1)
 [1]  0.006043341 -0.012907669  0.003978501 -0.003950639 -0.006020309 -0.000423439  0.003873601
 [8] -0.002634199  0.000193071  0.010008621 -0.004096619  0.000683561 -0.007403759  0.006433301
[15]  0.007553331 -0.002496069  0.008307881 -0.004780219  0.005430541 -0.007792829

x2-mean(x2)
 [1]  0.0039622385 -0.0093155415  0.0031978185 -0.0018427215 -0.0040443215 -0.0000098315
 [7]  0.0030910485 -0.0013344815  0.0001757985  0.0054476185 -0.0023106915  0.0002776485
[13] -0.0052140815  0.0041823185  0.0047488885 -0.0011648015  0.0049787285 -0.0032981115
[19]  0.0038447385 -0.0053722615

Now if you multiply those two vectors with each other you obviously get quite small numbers:

(x1-mean(x1)) * (x2-mean(x2))
 [1] 2.394516e-05 1.202419e-04 1.272252e-05 7.279927e-06 2.434807e-05 4.163041e-09 1.197349e-05
 [8] 3.515290e-06 3.394159e-08 5.452315e-05 9.466023e-06 1.897897e-07 3.860380e-05 2.690611e-05
[15] 3.586993e-05 2.907425e-06 4.136268e-05 1.576570e-05 2.087901e-05 4.186512e-05

Now take the sum and devide by n1 and you have the covariance:

sum((x1-mean(x1)) * (x2-mean(x2))) / (length(x1)-1)
[1] 2.591596e-05

That’s the reason why the magnitude of the covariance doesn’t say much about strength of how x1 and x2 co-vary. By standardizing (or normalizing) the covariance, that is dividing it by the product of the standard deviation of x1 and x2 (very similar to the covariance, i.e. 2.609127e-05),

r=covx,ysxsy=(x1ˉx)(yiˉy)(n1)sxsy

you get the high correlation coefficient, of r=0.99, which confirms what you can see in your plot.

Attribution
Source : Link , Question Author : kilojoules , Answer Author : Stefan

Leave a Comment