Sufficient and necessary conditions for zero eigenvalue of a correlation matrix

Given n random variable Xi, with probability distribution P(X1,,Xn), the correlation matrix Cij=E[XiXj]E[Xi]E[Xj] is positive semi-definite, i.e. its eigenvalues are positive or zero.

I am interested in the conditions on P that are necessary and/or sufficient for C to have m zero eigenvalues. For instance, a sufficient condition is that the random variables are not independent : iuiXi=0 for some real numbers ui. For example, if P(X1,,Xn)=δ(X1X2)p(X2,,Xn), then u=(1,1,0,,0) is an eigenvector of C with zero eigenvalue. If we have m independent linear constraints on the Xi‘s of this type, it would imply m zero eigenvalues.

There is at least one additional (but trivial) possibility, when Xa=E[Xa] for some a (i.e. P(X1,,Xn)δ(XaE[Xa])), since in that case Cij has a column and a line of zeros : Cia=Cai=0,i. As it is not really interesting, I am assuming that the probability distribution is not of that form.

My question is : are linear constraints the only way to induce zero eigenvalues (if we forbid the trivial exception given above), or can non-linear constraints on the random variables also generate zero eigenvalues of C ?

Answer

Perhaps by simplifying the notation we can bring out the essential ideas. It turns out we don’t need involve expectations or complicated formulas, because everything is purely algebraic.


The algebraic nature of the mathematical objects

The question concerns relationships between (1) the covariance matrix of a finite set of random variables X1,,Xn and (2) linear relations among those variables, considered as vectors.

The vector space in question is the set of all finite-variance random variables (on any given probability space (Ω,P)) modulo the subspace of almost surely constant variables, denoted L2(Ω,P)/R. (That is, we consider two random variables X and Y to be the same vector when there is zero chance that XY differs from its expectation.) We are dealing only with the finite-dimensional vector space V generated by the Xi, which is what makes this an algebraic problem rather than an analytic one.

What we need to know about variances

V is more than just a vector space: it is a quadratic module, because it comes equipped with the variance. All we need to know about variances are two things:

  1. The variance is a scalar-valued function Q with the property that Q(aX)=a2Q(X) for all vectors X.

  2. The variance is nondegenerate.

The second needs some explanation. Q determines a “dot product,” which is a symmetric bilinear form given by

XY=14(Q(X+Y)Q(XY)).

(This is of course nothing other than the covariance of the variables X and Y.) Vectors X and Y are orthogonal when their dot product is 0. The orthogonal complement of any set of vectors AV consists of all vectors orthogonal to every element of A, written

A0={vVa.v=0 for all vV}.

It is clearly a vector space. When V0={0}, Q is nondegenerate.

Allow me to prove that the variance is indeed nondegenerate, even though it might seem obvious. Suppose X is a nonzero element of V0. This means XY=0 for all YV; equivalently,

Q(X+Y)=Q(XY)

for all vectors Y. Taking Y=X gives

4Q(X)=Q(2X)=Q(X+X)=Q(XX)=Q(0)=0

and thus Q(X)=0. However, we know (using Chebyshev’s Inequality, perhaps) that the only random variables with zero variance are almost surely constant, which identifies them with the zero vector in V, QED.

Interpreting the questions

Returning to the questions, in the preceding notation the covariance matrix of the random variables is just a regular array of all their dot products,

T=(XiXj).

There is a good way to think about T: it defines a linear transformation on Rn in the usual way, by sending any vector x=(x1,,xn)Rn into the vector T(x)=y=(y1,,xn) whose ith component is given by the matrix multiplication rule

yi=nj=1(XiXj)xj.

The kernel of this linear transformation is the subspace it sends to zero:

Ker(T)={xRnT(x)=0}.

The foregoing equation implies that when xKer(T), for every i

0=yi=nj=1(XiXj)xj=Xi(jxjXj).

Since this is true for every i, it holds for all vectors spanned by the Xi: namely, V itself. Consequently, when xKer(T), the vector given by jxjXj lies in V0. Because the variance is nondegenerate, this means jxjXj=0. That is, x describes a linear dependency among the n original random variables.

You can readily check that this chain of reasoning is reversible:

Linear dependencies among the Xj as vectors are in one-to-one correspondence with elements of the kernel of T.

(Remember, this statement still considers the Xj as defined up to a constant shift in location–that is, as elements of L2(Ω,P)/R–rather than as just random variables.)

Finally, by definition, an eigenvalue of T is any scalar λ for which there exists a nonzero vector x with T(x)=λx. When λ=0 is an eigenvalue, the space of associated eigenvectors is (obviously) the kernel of T.


Summary

We have arrived at the answer to the questions: the set of linear dependencies of the random variables, qua elements of L2(Ω,P)/R, corresponds one-to-one with the kernel of their covariance matrix T. This is so because the variance is a nondegenerate quadratic form. The kernel also is the eigenspace associated with the zero eigenvalue (or just the zero subspace when there is no zero eigenvalue).


Reference

I have largely adopted the notation and some of the language of Chapter IV in

Jean-Pierre Serre, A Course In Arithmetic. Springer-Verlag 1973.

Attribution
Source : Link , Question Author : Adam , Answer Author : Community

Leave a Comment