In chapter 9 of the book Pattern recognition and machine learning, there is this part about Gaussian mixture model:
To be honest I don’t really understand why this would create a singularity. Can anyone explain this to me? I’m sorry but I’m just an undergraduate and a novice in machine learning, so my question may sound a little silly, but please help me. Thank you very much
If we want to fit a Gaussian to a single data point using maximum likelihood, we will get a very spiky Gaussian that “collapses” to that point. The variance is zero when there’s only one point, which in the multi-variate Gaussian case, leads to a singular covariance matrix, so it’s called the singularity problem.
When the variance gets to zero, the likelihood of the Gaussian component (formula 9.15) goes to infinity and the model becomes overfitted. This doesn’t occur when we fit only one Gaussian to a number of points since the variance can not be zero. But it can happen when we have a mixture of Gaussians, as illustrated on the same page of PRML.
The book suggests two methods for addressing the singularity problem, which are