# Training a basic Markov Random Field for classifying pixels in an image

I am attempting to learn how to use Markov Random Fields to segment regions in an image. I do not understand some of the parameters in the MRF or why the expectation maximisation I perform fails to converge to a solution sometimes.

Starting from Bayes’ theorem, I have $p(x|y) = p(y|x) p(x) / p(y)$, where $y$ is the pixel’s gray-scale value and $x$ is a class label. I have chosen to use a Gaussian distribution for $p(y|x)$, while $p(x)$ is modelled using the MRF.

I use a potential function for the MRF that has both pairwise clique potentials and a potential value for the class label of the pixel being classified. The single pixel potential value is some constant $\alpha$ that depends on the class label $x$. The pairwise potential functions are evaluated for the 4-connected neighbours and return positive $\beta$ if the neighbour has the same class label as this pixel and $-\beta$ if the labels differ.

At the point in the expectation maximisation where I have to find the values of $\alpha(x)$ and $\beta$ that maximise the expected value of the log-likelihood I used a numerical optimisation method (tried conjugate gradient, BFGS, Powell’s method) but would always find that the value of $\beta$ would become negative, the $\alpha$s would increase dramatically and an iteration or two later the whole image would be assigned to one label only (background: assigning class labels given MRF parameters was done using ICM). If I removed the alphas, i.e. only using pairwise clique potentials, then the expectation maximisation would work just fine.

Please explain what is the purpose of the alphas for each class? I thought they would be related to the amount of that class that is present in the image, but it appears not. Once I got the MRF working with only pairwise potentials, I compared it to a straight forward Gaussian Mixture Model and found that they produced almost identical results. I was expecting the pairwise potentials to smooth out the classes a bit, but that didn’t happen. Please advise where I went wrong.

## Diagnosis

This sounds like an initialization problem.

The MRF model that you are using is non-convex and, as such, has multiple local minima. As far as I know, all existing optimization techniques are sensitive to initialization, meaning that the quality of the final solution is highly affected by where you start the optimization procedure from.

## Suggested Solution

I suggest trying different strategies to initialize the model. For example, one strategy that comes to my mind is the following:

1. train a model for $p(y | x)$ first and ignore the prior term for now; that is fix $p(x)$ to be uniform, for example, by setting $\alpha = \beta = 0$ and keeping them fixed. If you want to be fancier, you can fix $p(x)$ to be a mutinomimal distribution that represents the relative frequencies of labels in the training set. You can do this by setting $\alpha$ values appropriately.

2. unfreeze the unary and pairwise terms in the MRF model; that is, let your optimizer change the value of $\alpha$ and $\beta$.

The suggested initialization is, by no means, the best way to initialize your optimization, but rather, just one possible option.

Finally, as Roman Shapovalov suggested, you can consider regularizing your prior parameters; for example, by putting a Gaussian prior on them: $\lambda_\alpha ||\alpha||^2 + \lambda_\beta ||\beta||^2$ where $\lambda_\alpha$ and $\lambda_\beta$ are hyper-parameters that can be interpreted as the variances of the Gaussian priors.