In general, how many points are needed to estimate a p-dimensional covariance matrix? Does it depend on how the data are spread out across the different dimensions? Does it depend on the true distribution of the data? Thank you!

**Answer**

As @whuber has commented (+1 to his comment) you need “*p data points in p dimensions*” under the assumption those points are independent. Nevertheless as you correctly recognise this will depended on the underlying distribution of your data as well as your sample size. That is because in finite samples sizes you have *finite sample effects*. That means that, in very approximate manner, your random sample exhibits properties (usually *regularities* in the form of collinearity) that should not be there. This is not something new: Finite or small sample corrections are something that is done ubiquitously in Statistics, for example the very popular Akaike Information Criterion (AIC) has a very easy to compute version that is corrected for finite samples: the AICc (that is unfortunately underused – AICc corrects for deviations from normalities, not collinearities). So how bad things might be?

[A quick note: A singular covariance matrix is essentially one that is not positive definite (PD). You can check if a matrix is PD by checking if it has a Cholesky decomposition. That is much faster than using eigendecomposition or SVD.]

Let’s say is looking at a random sample S1 such that s1∼U[0,1] and another sample S2 such that s2∼Tν=1 (ν being the degrees of freedom). How would thing go with a relatively small (<200) sample? Well… not splendid! Let’s simulate something like this (in MATLAB):

```
rng(1234)
N = 200;
M = 200;
FailsU = zeros(N,M);
FailsT = zeros(N,M);
for i = 1:M
for j = 1:N;
K = cov(rand(i));
[L,p] = chol(K,'lower');
if(p)
FailsU(j,i) = 1;
end
K = cov(random('t',1,[i,i]));
[L,p] = chol(K,'lower');
if(p)
FailsT(j,i) = 1;
end
end
end
```

About 50% of the time you will get a non-PD matrix. That’s definitely not good. What about if we had p+1 samples though?

```
% Same initializations as above
for i = 1:M
for j = 1:N;
K = cov(rand(i+1,i));
[L,p] = chol(K,'lower');
if(p)
Fails1(j,i) = 1;
end
K = cov(random('t',1,[i+1,i]));
[L,p] = chol(K,'lower');
if(p)
FailsT1(j,i) = 1;
end
end
end
```

Yeap we are fine! Why all this trouble though?

Let see the p×p version first:

When you are saying that a matrix is non-invertible or it has zero eigenvalues you are saying it is rank deficient. That is because the rank of a covariance matrix can be thought as the number of non-zero eigenvalues. In addition, when you compute the product STxSx you can get at most a matrix of rank p because the rank of a product of matrices is at most equal to the rank of any matrix in the product. That means that the covariance matrix of your p-dimensional sample has at most rank p.

Therefore for any number of samples N, the rank of Sx is at most p. From this we can assume that even small deviations from complete randomness would make our covariance matrix rank deficient (as seen in the first simulation). OK, fine why does p+1 seems to work so nicely?

Now let’s see the p+1×p version:

Think of how you estimate a sample covariance matrix: While in a quick manner we can write : K=1N−1STxSX (because we assumed Sx to have mean 0, we should properly write things as: K=1N−1ΣNi=1(Sx(i)−ˆμ)(Sx(i)−ˆμ)T where ˆμ is the sample mean. But what about the rank of this (Sx(i)−ˆμ)(Sx(i)−ˆμ)T matrix? Well… that is 1! Not only that but exactly because we subtracted ˆμ we reduced the original rank of the matrix Sx to begin with! So as we add up points (N gets larger) we have more chances to have a full-rank matrix. For the current case, exactly the covariance in our sample is completely diagonal, just adding another point ensure that it was *extremely unlikely* (not guaranteed) to have a rank deficiency.

**Attribution***Source : Link , Question Author : Vivek Subramanian , Answer Author : usεr11852*