Consider the log likelihood of a mixture of Gaussians:

l(Sn;θ)=n∑t=1logf(x(t)|θ)=n∑t=1log{k∑i=1pif(x(t)|μ(i),σ2i)}

I was wondering why it was computationally hard to maximize that equation directly? I was looking for either a clear solid intuition on why it should be obvious that its hard or maybe a more rigorous explanation of why its hard. Is this problem NP-complete or do we just not know how to solve it yet? Is this the reason we resort to use the EM (expectation-maximization) algorithm?

Notation:

Sn = training data.

x(t) = data point.

θ = the set of parameters specifying the Gaussian, theirs means, standard deviations and the probability of generating a point from each cluster/class/Gaussian.

pi = the probability of generating a point from cluster/class/Gaussian i.

**Answer**

First, GMM is a particular algorithm for clustering, where you try to find the optimal labelling of your n observations. Having k possible classes, it means that there are kn possible labellings of your training data. This becomes already huge for moderate values of k and n.

Second, the functional you are trying to minimize is not convex, and together with the size of your problem, makes it very hard. I only know that k-means (GMM can be seen as a soft version of kmeans) is NP-hard. But I am not aware of whether it has been proved for GMM as well.

To see that the problem is not convex, consider the one dimensional case:

L=log(e−(x/σ1)2+e−(x/σ2)2)

and check that you cannot guarantee that d2Ldx2>0 for all x.

Having a non-convex problem means that you can get stuck in local minima. In general, you do not have the strong warranties you have in convex optimization, and searching for a solution is also much harder.

**Attribution***Source : Link , Question Author : Charlie Parker , Answer Author : Xi’an*