# Are all log-likelihood functions twice differentiable?

For maximum likelihood estimation we need to set the first derivative of the log-likelihood function equal to $$\mathbf{0}$$.

The negative expected value of the Hessian matrix (second derivative) is then called the Fisher information matrix.

Is there anything inherent to the definition of a log-likelihood (probability density) function, that guarantees twice differentiabiliy of the log-likelihood? If not, what conditions must I impose to guarantee it?

In short: no. Note that, to maximise the log likelihood we frequently use differentiation, but in fact to truly maximise a function we need to consider several types of points

• Stationary/turning points (when $$\frac{\partial \ell}{\partial \theta} = 0$$)
• Singular points (e.g. where the function cannot be differentiated)
• End points – this only applies on a finite interval $$[a,b]$$, possibly with one of $$a$$ or $$b$$ being infinite in modulus

Of course, that is provided that the parameter of interest is actually continuous.

Let’s consider the Laplace distribution with density

$$p(x \mid \mu, b) = \frac{1}{2b} \exp \left\{ -\frac{|x – \mu|}{b} \right\}$$

Then the log-likelihood is, given a sample $$\mathbf{x}$$ of size $$n$$

$$\ell(\mu, b \mid \mathbf{x} ) = -n \log (2b) – \sum_{i=1}^n \frac{|x_i – \mu|}{b}$$

It can be shown that $$\hat{b} = \frac{1}{n} \sum_{i=1}^n |x_i – \hat{\mu}|$$. The difficult bit is finding $$\hat{\mu}$$.

Now if we differentiate w.r.t. $$\mu$$ then we need to differentiate $$|x_i – \mu|$$. If $$\mu \neq x_i$$ for any $$x_i$$ then $$\frac{\partial \ell}{\partial \mu} = – \sum_{i=1}^n\text{sign}(x_i – \mu)$$ which can be zero only if $$n$$ is even (but still might be non zero!). At any $$\mu \in \mathbf{x}$$ the gradient does not exist!.

Now for any $$\mu$$ that is equal to one of the $$x_i$$, the log likelihood will not be differentiable at these points. Now assume $$n$$ is odd, it can be shown that $$\hat{\mu}$$ is actually the sample median. The sample median will be one of the $$x_i$$ (the middle $$x_i$$ when the $$x_i$$ are in order). Therefore, the m.l.e. is at one of the non-differentiable points – a singular point!

How can we guarantee that the log-likelihood is differentiable? I don’t think we can actually force this to be true unless we choose a log-likelihood that is twice differentiable. I’d view this as a modelling choice or an assumption. Rather than something we can guarantee. Other assumptions might imply a twice differentiable log-likelihood, but in general I can’t see how we would end up with such a log-likelihood.