# Why is optimizing a mixture of Gaussian directly computationally hard?

Consider the log likelihood of a mixture of Gaussians:

I was wondering why it was computationally hard to maximize that equation directly? I was looking for either a clear solid intuition on why it should be obvious that its hard or maybe a more rigorous explanation of why its hard. Is this problem NP-complete or do we just not know how to solve it yet? Is this the reason we resort to use the EM (expectation-maximization) algorithm?

Notation:

$S_n$ = training data.

$x^{(t)}$ = data point.

$\theta$ = the set of parameters specifying the Gaussian, theirs means, standard deviations and the probability of generating a point from each cluster/class/Gaussian.

$p_i$ = the probability of generating a point from cluster/class/Gaussian i.

First, GMM is a particular algorithm for clustering, where you try to find the optimal labelling of your $n$ observations. Having $k$ possible classes, it means that there are $k^n$ possible labellings of your training data. This becomes already huge for moderate values of $k$ and $n$.
and check that you cannot guarantee that $\frac{d^2L}{dx^2} > 0$ for all x.