Relationship between Gram and covariance matrices

For a $n\times p$ matrix $X$, where $p \gg n$, what is the relationship between $X^{T}X$ (scatter matrix, on which covariance matrix is based) and $XX^{T}$ (outer product sometimes called Gram matrix)?

If one is known, how is it possible to obtain the other (the best one can do)?


A Singular Value Decomposition (SVD) of $X$ expresses it as

$$X = U D V^\prime$$

where $U$ is an $n\times r$ matrix whose columns are mutually orthonormal, $V$ is an $p\times r$ matrix whose columns are mutually orthonormal, and $D$ is an $r\times r$ diagonal matrix with positive values (the “singular values” of $X$) on the diagonal. Necessarily $r$–which is the rank of $X$–can be no greater than either $n$ or $p$.

Using this we compute

$$X^\prime X = (U D V^\prime)^\prime U D V^\prime = V D^\prime U^\prime U D V^\prime = V D^2 V^\prime$$


$$ X X^\prime= U D V^\prime (U D V^\prime)^\prime= U D V^\prime V D^\prime U^\prime= U D^2 U^\prime.$$

Although we can recover $D^2$ by diagonalizing either of $X^\prime X$ or $X X^\prime$, the former gives no information about $U$ and the latter gives no information about $V$. However, $U$ and $V$ are completely independent of each other–starting with one of them, along with $D$, you can choose the other arbitrarily (subject to the orthonormality conditions) and construct a valid matrix $X$. Therefore $D^2$ contains all the information that is common to the matrices $X^\prime X$ and $X X^\prime$.

There is a nice geometric interpretation that helps make this convincing. The SVD allows us to view any linear transformation $T_X$ (as represented by the matrix $X$) from $\mathbb{R}^p$ to $\mathbb{R}^n$ in terms of three easily understood linear transformations:

$V$ is the matrix of a transformation $T_V:\mathbb{R}^r \to \mathbb{R}^p$ that is one-to-one (has no kernel) and isometric. That is, it rotates $\mathbb{R}^r$ into an $r$-dimensional subspace $T_V(\mathbb{R}^r)$ of a $p$-dimensional space.

$U$ similarly is the matrix of a one-to-one, isometric transformation $T_U:\mathbb{R}^r\to \mathbb{R}^n$.

$D$ positively rescales the $r$ coordinate axes in $\mathbb{R}^r$, corresponding to a linear transformation $T_D$ that distorts the unit sphere (used for reference) into an ellipsoid without rotating it.

The transpose of $V$, $V^\prime$, corresponds to a linear transformation $T_{V^\prime}:\mathbb{R}^p\to\mathbb{R}^r$ that kills all vectors in $\mathbb{R}^p$ that are perpendicular to $T_V(\mathbb{R}^r)$. It otherwise rotates $T_V(\mathbb{R}^r)$ into $\mathbb{R}^r$. Equivalently, you can think of $T_{V^\prime}$ as “ignoring” any perpendicular directions and establishing an orthonormal coordinate system within $T_V(\mathbb{R}^r) \subset \mathbb{R}^p$. $T_D$ acts directly on that coordinate system, expanding by various amounts (as specified by the singular values) along the coordinate axes determined by $V$. $T_U$ then maps the result into $\mathbb{R}^n$.

The linear transformation associated with $X^\prime X$ in effect acts on $T_V(\mathbb{R}^r)$ through two “round trips”: $T_X$ expands the coordinates in the system determined by $V$ by $T_D$ and then $T_{X^\prime}$ does it all over again. Similarly, $X X^\prime$ does exactly the same thing to the $r$-dimensional subspace of $\mathbb{R}^n$ established by the $r$ orthogonal columns of $U$. Thus, the role of $V$ is to describe a frame in a subspace of $\mathbb{R}^p$ and the role of $U$ is to describe a frame in a subspace of $\mathbb{R}^n$. The matrix $X^\prime X$ gives us information about the frame in the first space and $X X\prime$ tells us the frame in the second space, but those two frames don’t have to have any relationship at all to one another.

Source : Link , Question Author : Amir , Answer Author : whuber

Leave a Comment