I have 2 correlation matrices $A$ and $B$ (using the Pearson’s linear correlation coefficient through Matlab’s corrcoef()). I would like to quantify how much “more correlation” $A$ contains compared to $B$. Is there any standard metric or test for that?

E.g. the correlation matrix

contains “more correlation” than

I am aware of the Box’s M Test, which is used to determine whether two or more covariance matrices are equal (and can be used for correlation matrices as well since the latter are the same as the covariance matrices of standardized random variables).

Right now I am comparing $A$ and $B$ via the mean of the absolute values of their non-diagonal elements, i.e. $\frac{2}{n^2-n}\sum_{1 \leq i < j \leq n } \left | x_{i, j} \right |$. (I use the symmetry of the correlation matrix in this formula). I guess that there might be some cleverer metrics.

Following Andy W’s comment on the matrix determinant, I ran an experiment to compare the metrics:

Mean of the absolute values of their non-diagonal elements: $\text{metric}_\text{mean}()$Matrix determinant: $\text{metric}_\text{determinant}()$:Let $A$ and $B$ two random symmetric matrix with ones on the diagonal of dimension $10 \times 10$. The upper triangle (diagonal excluded) of $A$ is populated with random floats from 0 to 1. The upper triangle (diagonal excluded) of $B$ is populated with random floats from 0 to 0.9. I generate 10000 such matrices and do some counting:

- $\text{metric}_\text{mean}(B) \leq \text{metric}_\text{mean}(A) $ 80.75% of the time
- $\text{metric}_\text{determinant}(B) \leq \text{metric}_\text{determinant}(A)$ 63.01% of the time
Given the result I would tend to think that $\text{metric}_\text{mean}(B)$ is a better metric.

Matlab code:

`function [ ] = correlation_metric( ) %CORRELATION_METRIC Test some metric for % http://stats.stackexchange.com/q/110416/12359 : % I have 2 correlation matrices A and B (using the Pearson's linear % correlation coefficient through Matlab's corrcoef()). % I would like to quantify how much "more correlation" % A contains compared to B. Is there any standard metric or test for that? % Experiments' parameters runs = 10000; matrix_dimension = 10; %% Experiment 1 results = zeros(runs, 3); for i=1:runs dimension = matrix_dimension; M = generate_random_symmetric_matrix( dimension, 0.0, 1.0 ); results(i, 1) = abs(det(M)); % results(i, 2) = mean(triu(M, 1)); results(i, 2) = mean2(M); % results(i, 3) = results(i, 2) < results(i, 2) ; end mean(results(:, 1)) mean(results(:, 2)) %% Experiment 2 results = zeros(runs, 6); for i=1:runs dimension = matrix_dimension; M = generate_random_symmetric_matrix( dimension, 0.0, 1.0 ); results(i, 1) = abs(det(M)); results(i, 2) = mean2(M); M = generate_random_symmetric_matrix( dimension, 0.0, 0.9 ); results(i, 3) = abs(det(M)); results(i, 4) = mean2(M); results(i, 5) = results(i, 1) > results(i, 3); results(i, 6) = results(i, 2) > results(i, 4); end mean(results(:, 5)) mean(results(:, 6)) boxplot(results(:, 1)) figure boxplot(results(:, 2)) end function [ random_symmetric_matrix ] = generate_random_symmetric_matrix( dimension, minimum, maximum ) % Based on http://www.mathworks.com/matlabcentral/answers/123643-how-to-create-a-symmetric-random-matrix d = ones(dimension, 1); %rand(dimension,1); % The diagonal values t = triu((maximum-minimum)*rand(dimension)+minimum,1); % The upper trianglar random values random_symmetric_matrix = diag(d)+t+t.'; % Put them together in a symmetric matrix end`

Example of a generated $10 \times 10$ random symmetric matrix with ones on the diagonal:

`>> random_symmetric_matrix random_symmetric_matrix = 1.0000 0.3984 0.1375 0.4372 0.2909 0.6172 0.2105 0.1737 0.2271 0.2219 0.3984 1.0000 0.3836 0.1954 0.5077 0.4233 0.0936 0.2957 0.5256 0.6622 0.1375 0.3836 1.0000 0.1517 0.9585 0.8102 0.6078 0.8669 0.5290 0.7665 0.4372 0.1954 0.1517 1.0000 0.9531 0.2349 0.6232 0.6684 0.8945 0.2290 0.2909 0.5077 0.9585 0.9531 1.0000 0.3058 0.0330 0.0174 0.9649 0.5313 0.6172 0.4233 0.8102 0.2349 0.3058 1.0000 0.7483 0.2014 0.2164 0.2079 0.2105 0.0936 0.6078 0.6232 0.0330 0.7483 1.0000 0.5814 0.8470 0.6858 0.1737 0.2957 0.8669 0.6684 0.0174 0.2014 0.5814 1.0000 0.9223 0.0760 0.2271 0.5256 0.5290 0.8945 0.9649 0.2164 0.8470 0.9223 1.0000 0.5758 0.2219 0.6622 0.7665 0.2290 0.5313 0.2079 0.6858 0.0760 0.5758 1.0000`

**Answer**

The determinant of the covariance isn’t a terrible idea, but you probably want to use the **inverse** of the determinant. Picture the contours (lines of equal probability density) of a bivariate distribution. You can think of the determinant as (approximately) measuring the volume of a given contour. Then a highly correlated set of variables actually has less volume, because the contours are so stretched.

For example:

If $X \sim N(0, 1)$ and $Y = X + \epsilon$, where $\epsilon \sim N(0, .01)$, then

$$

Cov (X, Y) = \begin{bmatrix}

1 & 1 \\

1 & 1.01

\end{bmatrix}

$$

so

$$

Corr (X, Y) \approx \begin{bmatrix}

1 & .995 \\

.995 & 1

\end{bmatrix}

$$

so the determinant is $\approx .0099$. On the other hand, if $X, Y$ are independent $N(0, 1)$, then the determinant is 1.

As any pair of variables becomes more nearly linearly dependent, the determinant approaches zero, since it’s the product of the eigenvalues of the correlation matrix. So the determinant may not be able to distinguish between a single pair of nearly-dependent variables, as opposed to many pairs, and this is unlikely to be a behavior you desire. I would suggest simulating such a scenario. You could use a scheme like this:

- Fix a dimension P, an approximate rank r, and let s be a large constant
- Let A[1], …, A[r] be random vectors, drawn iid from N(0, s) distribution
- Set Sigma = Identity(P)
- For i=1..r: Sigma = Sigma + A[i] * A[i]^T
- Set rho to be Sigma scaled as a correlation matrix

Then rho will have approximate rank r, which determines how many nearly linearly independent variables you have. You can see how the determinant reflects the approximate rank r and scaling s.

**Attribution***Source : Link , Question Author : Franck Dernoncourt , Answer Author : Andrew M*