I am trying to understand how covariance matrix works. So let’s suppose we have two variables: X, Y, where \text{Cov}(X,Y) = \mathbb{E}[(x -\mathbb{E}[X])(y-\mathbb{E}[Y])] gives the relation between the variables, ie how much one depends on the other.

Now, three variable case it is less clear for me. An intuitive definition for covariance function would be \text{Cov}(X,Y,Z) = \mathbb{E}[(x -\mathbb{E}[X])(y-\mathbb{E}[Y])(z-\mathbb{E}[Z])], but instead the literature suggests using covariance matrix that is defined as two variable covariance for each pair of variables.

So, does the covariance include full information about variable relations? If so, what is the relation to my definition of \text{Cov}(X,Y,Z)?

**Answer**

To expand on Zachary’s comment, the covariance matrix does not capture the “relation” between two random variables, as “relation” is too broad of a concept. For example, we’d probably want to include the dependence of two variables on each other to be include in any measure of their “relation”.

However, we know that cov(X,Y)=0 does not imply that they are independent, as for example is the case with two random variables X~U(-1,1) and Y=X^2 (for a short proof, see: https://en.wikipedia.org/wiki/Covariance#Uncorrelatedness_and_independence).

So if we’d think that the covariance includes full information about variable relations, as you ask, zero covariance would suggest no dependence. This is what Zachary means when he says that there can be non-linear dependences that the covariance does not capture.

However, let X:=(X_{1},…,X_{n})’ be multivariate normal, X~N(\mu,\Sigma). Then X_{1},…,X_{n} are independent iff \Sigma is a diagonal matrix with all off-diagonal elements = 0 (if all covariances = 0).

To see that this condition is sufficient, observe that the joint density factors,

\begin{equation} f(x_{1},…,x_{n}) =

\dfrac{1}{ \sqrt{(2 \pi)^{n} | \Sigma |}} exp(- \dfrac{1}{2} (x – \mu)’ \Sigma^{-1} (x – \mu))= \Pi^{n}_{i=1} \dfrac{1}{\sqrt{2 \pi \sigma_{ii}}} exp(- \dfrac{(x_{i}-\mu_{i})^{2}}{2 \sigma_{ii}})=f_{1}(x_{1})…f_{n}(x_{n})\end{equation}.

To see that the condition is necessary, recall the bivariate case. If X_{1} and X_{2} are independent, then X_{1} and X_{1}|X_{2} = x_{2} must have the same variance, so

\begin{equation} \sigma_{11}=\sigma_{11|2}=\sigma_{11}-\sigma^{2}_{12} \sigma^{-1}_{22} \end{equation}

which implies \sigma_{12}=0. By the same argument, all off-diagonal elements of \Sigma must be zero.

(source: prof. Geert Dhaene’s Advanced Econometrics slides)

**Attribution***Source : Link , Question Author : Karolis , Answer Author : hrrrrrr5602*