# Why does this set of data have no covariance?

My understanding of how covariance works is that data that are correlated should have a somewhat high covariance. I’ve come across a situation where my data looks correlated (as shown in the scatter plot) but the covariance is near-zero. How can the covariance of the data be zero if they are correlated?

import numpy as np
x1 = np.array([ 0.03551153,  0.01656052,  0.03344669,  0.02551755,  0.02344788,
0.02904475,  0.03334179,  0.02683399,  0.02966126,  0.03947681,
0.02537157,  0.03015175,  0.02206443,  0.03590149,  0.03702152,
0.02697212,  0.03777607,  0.02468797,  0.03489873,  0.02167536])
x2 = np.array([ 0.0372599 ,  0.02398212,  0.03649548,  0.03145494,  0.02925334,
0.03328783,  0.03638871,  0.03196318,  0.03347346,  0.03874528,
0.03098697,  0.03357531,  0.02808358,  0.03747998,  0.03804655,
0.03213286,  0.03827639,  0.02999955,  0.0371424 ,  0.0279254 ])
print np.cov(x1, x2)

array([[  3.95773132e-05,   2.59159589e-05],
[  2.59159589e-05,   1.72006225e-05]]) The magnitude of covariance depends on the magnitude of the data and how close those data points are scattered around the mean of that data. It’s easy to see when you look at the formula:

$cov_{x,y}= \frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{n-1}$

In your case, the deviance of the x1 and x2 data points to the mean of x1 and x2 are:

x1-mean(x1)
  0.006043341 -0.012907669  0.003978501 -0.003950639 -0.006020309 -0.000423439  0.003873601
 -0.002634199  0.000193071  0.010008621 -0.004096619  0.000683561 -0.007403759  0.006433301
  0.007553331 -0.002496069  0.008307881 -0.004780219  0.005430541 -0.007792829

x2-mean(x2)
  0.0039622385 -0.0093155415  0.0031978185 -0.0018427215 -0.0040443215 -0.0000098315
  0.0030910485 -0.0013344815  0.0001757985  0.0054476185 -0.0023106915  0.0002776485
 -0.0052140815  0.0041823185  0.0047488885 -0.0011648015  0.0049787285 -0.0032981115
  0.0038447385 -0.0053722615


Now if you multiply those two vectors with each other you obviously get quite small numbers:

(x1-mean(x1)) * (x2-mean(x2))
 2.394516e-05 1.202419e-04 1.272252e-05 7.279927e-06 2.434807e-05 4.163041e-09 1.197349e-05
 3.515290e-06 3.394159e-08 5.452315e-05 9.466023e-06 1.897897e-07 3.860380e-05 2.690611e-05
 3.586993e-05 2.907425e-06 4.136268e-05 1.576570e-05 2.087901e-05 4.186512e-05


Now take the sum and devide by $n-1$ and you have the covariance:

sum((x1-mean(x1)) * (x2-mean(x2))) / (length(x1)-1)
 2.591596e-05


That’s the reason why the magnitude of the covariance doesn’t say much about strength of how x1 and x2 co-vary. By standardizing (or normalizing) the covariance, that is dividing it by the product of the standard deviation of x1 and x2 (very similar to the covariance, i.e. 2.609127e-05),

$r=\frac{cov_{x,y}}{s_x s_y} = \frac{\sum(x_1-\bar{x})(y_i-\bar{y})}{(n-1) s_x s_y}$

you get the high correlation coefficient, of $r=0.99$, which confirms what you can see in your plot.