MVN is degenerate when the covariance matrix $\Sigma$ is singular.

I am trying to understand mainly conceptual (but also theoretical) implications of this. The Wikipedia article is quite terse. It mentions the following non-trivial (atleast to me) things:

MVN does not have a density. More precisely, it does not have a density with respect to $k$-dimensional Lebesgue measure.

What does having

no densitymean in simple terms – if possible? It implies as if there are things that itdoeshave? Is it possible to take samples from this distribution?Geometrically this means that every contour ellipsoid is infinitely thin and has zero volume in $n$-dimensional space.

Does this relate to the degenerate case in sense that along the dependent subspace the variance is 0? Thus, removing the dependent subspace from $\Sigma$ – and thus reducing the dimension of $\mathbf {x}$ – will make

having a proper densityfrom which samples can be taken. The degenerate samples can then be reconstructed from $\mathbf {x}$. This is correct?It is suggested to use the following density instead: $f(\mathbf {x}) =\left(\det \nolimits^{*}(2\pi \boldsymbol {\Sigma })\right)^{-\frac {1}{2}} \, e^{-\frac {1}{2} (\mathbf {x} -\boldsymbol {\mu})’ \boldsymbol {\Sigma }^{+}(\mathbf {x} – \boldsymbol {\mu})}$

Suppose I use the Moore-Penrose pseudoinverse and disregard the non-zero eigenvalues in determinant calculation. Now I have a density. How are the samples from this density related to the degenerate case?

Wiki doesn’t mention it, but what in case of non-singularity with negative eigenvalues? Determinant might or might not be negative then.

Positive-definiteness is a stricter concept than non-singularity. How does that relate?

**Answer**

A joint density function, say of two random variables $X$ and $Y$, is

$f_{X,Y}(x,y)$ is an ordinary function of two real variables and the meaning that we ascribe to it is that if $\mathcal B$ is a region

of *very small* area $b$ with the property that $(x_0, y_0) \in \mathcal B$, then

$$P\{(X,Y)\in \mathcal B\} \approx f_{X,Y}(x_0,y_0)\cdot b \tag 1$$

and that this approximation gets better and better as $\mathcal B$ shrinks

in area, and $b \to 0$. Of course, both sides of $(1)$ approach $0$ as

$b \to 0$, but the *ratio* $\frac{P\{(X,Y)\in \mathcal B\}}{b}$ is converging to $f_{X,Y}(x_0,y_0)$. If we think of probability as probability mass

spread over the $x$-$y$ plane, then $f_{X,Y}(x_0,y_0)$ is the *density*

of the probability mass at the point $(x_0,y_0)$. Note that

$f_{X,Y}(x,y)$ is not a probability, but a probability *density*, and

it is measured in probability mass *per unit area*. In particular,

note that it is possible for $f_{X,Y}(x_0,y_0)$ to exceed $1$ (probability mass is very dense at $(x_0,y_0)$), and we need to multiply it by an *area* (as in $(1)$) to get a probability from it.

With that as prologue, consider the case when $Y = \alpha X + \beta$. Now, the random point $(X,Y)$ is constrained to lie on the straight line

$y = \alpha x + \beta$ in the $x$-$y$ plane. Consequently, $X$ and $Y$

do not enjoy a joint *density* because all the probability mass lies on

the straight line which has zero area. (Remember that old shibboleth about a line having zero width that you learned in muddle school?) So,

we cannot write something like $(1)$. The probability mass is all there;

it lies along the straight line $y = \alpha x + \beta$, but its *joint
density*

(in terms of mass per unit area) is infinite along that straight line.

So, now what? Well, the trick is to understand that we really have just

one random variable, and questions about $(X,Y)$ can be translated into

questions about just $X$, and answered in terms of $X$ alone.

For example, (with $\alpha > 0$)

$$F_{X,Y}(x_0,y_0) = P\{X\leq x_0, Y\leq y_0\}

= P\{X\leq x_0, \alpha X + \beta \leq y_0\}

= P\left\{X \leq \min\left(x_0, \frac{y_0-\beta}{\alpha}\right)\right\}.$$

Note that all the usual rules apply even though $X$ and $Y$ do not have

a joint density. For example,

$$\operatorname{cov}(X,Y)= \operatorname{cov}(X,\alpha X+\beta)

= \alpha \operatorname{var}(X)$$

and so on.

*Finally,* if you are still paying attention, if $n$ jointly normal

random variables $X_i$ have a singular covariance matrix $\Sigma$ and

mean vector $\mathbf m$, then

that means that there are $m < n$ independent standard normal random

variables $Y_j$ such that

$$(X_1,X_2,\ldots, X_n) = (Y_1,Y_2,\ldots, Y_m)\mathbf A + \mathbf m$$

where $\mathbf A$ is a $m\times n$ matrix, and all questions about

$(X_1,X_2,\ldots, X_n)$ can be stated in terms of $(Y_1,Y_2,\ldots, Y_m)$

and answered in terms of these iid random variables. Note that

$\Sigma = \mathbf A^T\mathbf A$.

**Attribution***Source : Link , Question Author : Davor Josipovic , Answer Author : Michael Hardy*