# MLE for normal distribution with restrictive parameters

Suppose that $$X_1, . . . , X_n$$, $$n\geq 2$$, is a sample from a $$N(\mu,\sigma^2)$$ distribution. Suppose $$\mu$$ and $$\sigma^2$$ are both known to be nonnegative but otherwise unspecified. Now, I want to find the MLE of $$\mu$$ and $$\sigma^2$$. I have drawn the MLE for non-restrictive parameters but I am stuck on this one.

# Solution

Let $$\bar{x}$$ denote the sample mean:

$$\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i$$

The constrained maximum likelihood mean $$\hat{\mu}$$ and variance $$\hat{\sigma}^2$$ are:

$$\hat{\mu} = \left\{ \begin{array}{cl} \bar{x} & \bar{x} \ge 0 \\ 0 & \text{Otherwise} \\ \end{array} \right.$$

$$\hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (x_i – \hat{\mu})^2$$

That is, we simply take the sample mean and clip it to zero if it’s negative. Then, plug it into the usual expression for the (uncorrected) sample variance. I obtained these expressions by setting up the constrained optimization problem, then solving for the parameters that satisfy the KKT conditions, as described below.

# Derivation

### Objective function

Maximizing the likelihood is equivalent to minimizing the negative log likelihood $$L(\mu, \sigma^2)$$, which will be more convenient to work with:

$$L(\mu, \sigma^2) = -\sum_{i=1}^n \log \mathcal{N}(x_i \mid \mu, \sigma^2)$$

$$= \frac{n}{2} \log(2 \pi) + \frac{n}{2} \log(\sigma^2) + \frac{1}{2 \sigma^2} \sum_{i=1}^n (x_i-\mu)^2$$

We’ll also need its partial derivatives w.r.t. $$\mu$$ and $$\sigma^2$$:

$$\frac{\partial}{\partial \mu} L(\mu, \sigma^2) = \frac{n \mu}{\sigma^2} – \frac{1}{\sigma^2} \sum_{i=1}^n x_i$$

$$\frac{\partial}{\partial \sigma^2} L(\mu, \sigma^2) = \frac{n}{2 \sigma^2} – \frac{1}{2 \sigma^4} \sum_{i=1}^n (x_i-\mu)^2$$

### Optimization problem

The goal is to find the parameters $$\hat{\mu}$$ and $$\hat{\sigma}^2$$ that minimize the negative log likelihood, subject to a non-negativity constraint on the mean. The variance is non-negative by definition and the solution below turns out to automatically respect this constraint, so we don’t need to impose it explicitly. The optimization problem can be written as:

$$\hat{\mu}, \hat{\sigma}^2 = \arg \min_{\mu, \sigma^2} \ L(\mu, \sigma^2) \quad \text{s.t. } g(\mu, \sigma^2) \le 0$$

$$\text{where } \ g(\mu, \sigma^2) = -\mu$$

I’ve written the constraint this way to follow convention, which should hopefully make it easier to match this up with other discussions about constrained optimization. In our problem, this just amounts to the constraint $$\mu \ge 0$$.

### KKT conditions

If $$(\hat{\mu}, \hat{\sigma}^2)$$ is an optimal solution, there must exist a constant $$\lambda$$ such that the KKT conditions hold: 1) stationarity, 2) primal feasibility, 3) dual feasibility, and 4) complementary slackness. Furthermore, we have a convex loss function with a convex, continuously differentiable constraint. This implies that the KKT conditions are sufficient for optimality, so we can find the solution by solving for the parameters that satisfy these conditions.

Stationarity:

$$\frac{\partial}{\partial \mu} L(\hat{\mu}, \hat{\sigma}^2) + \lambda \frac{\partial}{\partial \mu} g(\hat{\mu}, \hat{\sigma}^2) = 0$$

$$\frac{\partial}{\partial \sigma^2} L(\hat{\mu}, \hat{\sigma}^2) + \lambda \frac{\partial}{\partial \sigma^2} g(\hat{\mu}, \hat{\sigma}^2) = 0$$

Plug in expressions for the derivatives and solve for the parameters:

$$\hat{\mu} = \frac{1}{n} \hat{\sigma}^2 \lambda + \frac{1}{n} \sum_{i=1}^n x_i \tag{1}$$

$$\hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (x_i-\hat{\mu})^2 \tag{2}$$

Primal feasibility:

$$g(\hat{\mu}, \hat{\sigma}^2) \le 0 \implies \hat{\mu} \ge 0$$

This just says the parameters must respect the constraints

Dual feasibility:

$$\lambda \ge 0$$

Complementary slackness:

$$\lambda g(\hat{\mu}, \hat{\sigma}^2) = 0 \implies \lambda \hat{\mu} = 0$$

This says that either $$\lambda$$ or $$\hat{\mu}$$ (or both) must be zero.

### Solving

Note that the RHS of equation $$(1)$$ is a multiple of $$\lambda$$ plus the sample mean $$\frac{1}{n} \sum_{i=1}^n x_i$$. If the sample mean is non-negative, set $$\lambda$$ to zero (satisfying the dual feasibility and complementary slackness conditions). It then follows from equation $$(1)$$ (the stationarity condition) that $$\hat{\mu}$$ is equal to the sample mean. This also satisfies the primal feasibility condition, since it’s non-negative.

Otherwise, if the sample mean is negative, set $$\hat{\mu}$$ to zero (satisfying the primal feasibility and complementary slackness conditions). To satisfy equation $$(1)$$ (the stationarity condition), set $$\lambda = -\hat{\sigma}^{-2} \sum_{i=1}^n x_i$$. Since the sample mean is negative and the variance is positive, $$\lambda$$ takes a positive value, satisfying the dual feasibility conditionn.

In both cases, we can plug $$\hat{\mu}$$ into equation $$(2)$$ to obtain $$\hat{\sigma}^2$$.