Suppose I have a matrix X which is n by p, i.e. it has n observations, with each observation in p-dimensional space.

How do I find the variance of these n observations?

In the case where p = 1, I just need to use the regular variance formula. What about the cases where p > 1.

**Answer**

For a p-dimensional random variable X=(X1,…,Xp)⊺, we have the following definition of the variance:

Var\left( X \right) = E\left[ {\left( {X – EX} \right){{\left( {X – EX} \right)}^\intercal}} \right] = \left( {\begin{array}{*{20}{c}}

{Var\left( {{X_1}} \right)}& \ldots &{Cov\left( {{X_1},{X_p}} \right)} \\

\vdots & \ddots & \vdots \\

{Cov\left( {{X_p},{X_1}} \right)}& \ldots &{Var\left( {{X_p}} \right)}

\end{array}} \right)

That is, the variance of a random vector is defined as the matrix which stores all the variances on the main diagonal and the covariances between the different components in the other elements. The sample p \times p covariance matrix would then be calculated by plugging in the sample analogs for the population variables:

\frac{1}{{n – 1}}\left( {\begin{array}{*{20}{c}}

{\sum\limits_{i = 1}^n {{{\left( {{X_{i1}} – {{\bar X}_{\cdot1}}} \right)}^2}} }& \ldots &{\sum\limits_{i = 1}^n {\left( {{X_{i1}} – {{\bar X}_{\cdot1}}} \right)\left( {{X_{ip}} – {{\bar X}_{\cdot p}}} \right)} } \\

\vdots & \ddots & \vdots \\

{\sum\limits_{i = 1}^n {\left( {{X_{ip}} – {{\bar X}_{\cdot p}}} \right)\left( {{X_{i1}} – {{\bar X}_{\cdot 1}}} \right)} }& \ldots &{\sum\limits_{i = 1}^n {{{\left( {{X_{ip}} – {{\bar X}_{\cdot p}}} \right)}^2}} }

\end{array}} \right)

where {X_{ij}} denotes the ith observation for feature j and {{\bar X}_{ \cdot j}} the sample mean of the jth feature. To sum up, the variance of a random vector is defined as the matrix containing the individual variances and covariances. It therefore suffices to calculate the sample variances and covariances for all the vector components individually.

**Attribution***Source : Link , Question Author : statnub , Answer Author : Philipp Burckhardt*