# Intuition / interpretation of a distribution of eigenvalues of a correlation matrix?

What is your intuition / interpretation of a distribution of eigenvalues of a correlation matrix? I tend to hear that usually 3 largest eigenvalues are the most important, while those close to zero are noise. Also, I’ve seen a few research papers investigating how naturally occuring eigenvalue distributions differ from those calculated from random correlation matrices (again, distinguising noise from signal).

I tend to hear that usually 3 largest eigenvalues are the most important, while those close to zero are noise

You can test for that. See the paper linked in this post for more detail. Again if your dealing with financial times series you might wanna correct for leptokurticity first (i.e. consider the series of garch-adjusted returns, not the raw returns).

I’ve seen a few research papers investigating how naturally occuring eigenvalue distributions differ from those calculated from random correlation matrices (again, distinguising noise from signal).

Edward:> Usually, one would do it the other way arround: look at the multivariate distribution of eigenvalues (of correlation matrices) coming from the application you want. Once you have identified a credible candidate for the distribution of eigenvalues, it should be fairly easy to generate from them.

The best procedure on how to identify the multivariate distribution of your eigenvalues depends on how many assets you want to consider simultaneously (i.e. what are the dimensions of your correlation matrix). There is a neat trick if $p\leq 10$ ($p$ being the number of assets).

1. Suppose you have $j=1,…,J$ sub samples of multivariate data. You need an estimator of the variance-covariance matrix $\tilde{C}_j$ for each sub-sample $j$ ( you could use the classical estimator or a robust alternative such as the fast MCD, which is well implemented in matlab, SAS, S,R,…). As usual, if your dealing with financial times series you would want to consider the series of garch-adjusted returns, not raw returns.
2. For each sub sample $j$, compute $\tilde{\Lambda}_j=$ $\log(\tilde{\lambda}_1^j)$ ,…, $\log(\tilde{\lambda}_p^j)$ , the eigen values of $\tilde{C}_j$.
3. Compute $CV(\tilde{\Lambda})$, the convex hull of the $J \times p$ matrix whose j-th entry is $\tilde{\Lambda}_j$ (again, this is well implemented in Matlab, R,…).
4. Draw points at random from inside $CV(\tilde{\Lambda})$ (this done by giving weight $w_i$ to each of the edges of $CV(\tilde{\Lambda})$ where $w_i=\frac{\gamma_i}{\sum_{i=1}^{p}\gamma_i}$, where $\gamma_i$ is a draw from an unit exponential distribution (more details here).
A limitation is that fast computation of the convex hull of a series of points becomes extremely slow when the number of dimensions is larger than 10. $J\geq2$