My understanding of how covariance works is that data that are correlated should have a somewhat high covariance. I’ve come across a situation where my data looks correlated (as shown in the scatter plot) but the covariance is near-zero. How can the covariance of the data be zero if they are correlated?

`import numpy as np x1 = np.array([ 0.03551153, 0.01656052, 0.03344669, 0.02551755, 0.02344788, 0.02904475, 0.03334179, 0.02683399, 0.02966126, 0.03947681, 0.02537157, 0.03015175, 0.02206443, 0.03590149, 0.03702152, 0.02697212, 0.03777607, 0.02468797, 0.03489873, 0.02167536]) x2 = np.array([ 0.0372599 , 0.02398212, 0.03649548, 0.03145494, 0.02925334, 0.03328783, 0.03638871, 0.03196318, 0.03347346, 0.03874528, 0.03098697, 0.03357531, 0.02808358, 0.03747998, 0.03804655, 0.03213286, 0.03827639, 0.02999955, 0.0371424 , 0.0279254 ]) print np.cov(x1, x2) array([[ 3.95773132e-05, 2.59159589e-05], [ 2.59159589e-05, 1.72006225e-05]])`

**Answer**

The magnitude of covariance depends on the magnitude of the data and how close those data points are scattered around the mean of that data. It’s easy to see when you look at the formula:

covx,y=∑(xi−ˉx)(yi−ˉy)n−1

In your case, the deviance of the `x1`

and `x2`

data points to the mean of `x1`

and `x2`

are:

```
x1-mean(x1)
[1] 0.006043341 -0.012907669 0.003978501 -0.003950639 -0.006020309 -0.000423439 0.003873601
[8] -0.002634199 0.000193071 0.010008621 -0.004096619 0.000683561 -0.007403759 0.006433301
[15] 0.007553331 -0.002496069 0.008307881 -0.004780219 0.005430541 -0.007792829
x2-mean(x2)
[1] 0.0039622385 -0.0093155415 0.0031978185 -0.0018427215 -0.0040443215 -0.0000098315
[7] 0.0030910485 -0.0013344815 0.0001757985 0.0054476185 -0.0023106915 0.0002776485
[13] -0.0052140815 0.0041823185 0.0047488885 -0.0011648015 0.0049787285 -0.0032981115
[19] 0.0038447385 -0.0053722615
```

Now if you multiply those two vectors with each other you obviously get quite small numbers:

```
(x1-mean(x1)) * (x2-mean(x2))
[1] 2.394516e-05 1.202419e-04 1.272252e-05 7.279927e-06 2.434807e-05 4.163041e-09 1.197349e-05
[8] 3.515290e-06 3.394159e-08 5.452315e-05 9.466023e-06 1.897897e-07 3.860380e-05 2.690611e-05
[15] 3.586993e-05 2.907425e-06 4.136268e-05 1.576570e-05 2.087901e-05 4.186512e-05
```

Now take the sum and devide by n−1 and you have the covariance:

```
sum((x1-mean(x1)) * (x2-mean(x2))) / (length(x1)-1)
[1] 2.591596e-05
```

That’s the reason why the magnitude of the covariance doesn’t say much about strength of how `x1`

and `x2`

co-vary. By standardizing (or normalizing) the covariance, that is dividing it by the product of the standard deviation of `x1`

and `x2`

(very similar to the covariance, i.e. `2.609127e-05`

),

r=covx,ysxsy=∑(x1−ˉx)(yi−ˉy)(n−1)sxsy

you get the high correlation coefficient, of r=0.99, which confirms what you can see in your plot.

**Attribution***Source : Link , Question Author : kilojoules , Answer Author : Stefan*