For

maximum likelihood estimationwe need to set the first derivative of the log-likelihood function equal to $\mathbf{0}$.The negative expected value of the Hessian matrix (second derivative) is then called the Fisher information matrix.

Is there anything inherent to the definition of a log-likelihood (probability density) function, that guarantees twice differentiabiliy of the log-likelihood? If not, what conditions must I impose to guarantee it?

**Answer**

In short: no. Note that, to maximise the log likelihood we frequently use differentiation, but in fact to truly maximise a function we need to consider several types of points

- Stationary/turning points (when $\frac{\partial \ell}{\partial \theta} = 0$)
- Singular points (e.g. where the function cannot be differentiated)
- End points – this only applies on a finite interval $[a,b]$, possibly with one of $a$ or $b$ being infinite in modulus

Of course, that is provided that the parameter of interest is actually continuous.

Let’s consider the Laplace distribution with density

$$p(x \mid \mu, b) = \frac{1}{2b} \exp \left\{ -\frac{|x – \mu|}{b} \right\}$$

Then the log-likelihood is, given a sample $\mathbf{x}$ of size $n$

$$ \ell(\mu, b \mid \mathbf{x} ) = -n \log (2b) – \sum_{i=1}^n \frac{|x_i – \mu|}{b}$$

It can be shown that $\hat{b} = \frac{1}{n} \sum_{i=1}^n |x_i – \hat{\mu}|$. The difficult bit is finding $\hat{\mu}$.

Now if we differentiate w.r.t. $\mu$ then we need to differentiate $|x_i – \mu|$. If $\mu \neq x_i$ for any $x_i$ then $\frac{\partial \ell}{\partial \mu} = – \sum_{i=1}^n\text{sign}(x_i – \mu)$ which can be zero *only* if $n$ is even (but still might be non zero!). At any $\mu \in \mathbf{x}$ the gradient **does not exist!**.

Now for any $\mu$ that is equal to one of the $x_i$, the log likelihood will not be differentiable at these points. Now assume $n$ is odd, it can be shown that $\hat{\mu}$ is actually the sample median. The sample median will be one of the $x_i$ (the middle $x_i$ when the $x_i$ are in order). Therefore, the m.l.e. is at one of the non-differentiable points – a singular point!

How can we guarantee that the log-likelihood is differentiable? I don’t think we can actually force this to be true *unless* we choose a log-likelihood that is twice differentiable. I’d view this as a modelling choice or an assumption. Rather than something we can guarantee. Other assumptions might imply a twice differentiable log-likelihood, but in general I can’t see how we would end up with such a log-likelihood.

**Attribution***Source : Link , Question Author : stollenm , Answer Author : jcken*