# Can the empirical Hessian of an M-estimator be indefinite?

Jeffrey Wooldridge in his Econometric Analysis of Cross Section and Panel Data (page 357) says that the empirical Hessian “is not guaranteed to be positive definite, or even positive semidefinite, for the particular sample we are working with.”.

This seems wrong to me as (numerical problems apart) the Hessian must be positive semidefinite as a result of the definition of the M-estimator as the value of the parameter which minimizes the objective function for the given sample and the well-known fact that at a (local) minimum the Hessian is positive semidefinite.

Is my argument right?

[EDIT: The statement has been removed in the 2nd ed. of the book. See comment.]

BACKGROUND
Suppose that $\widehat \theta_N$ is an estimator obtained by minimizing
$${1 \over N}\sum_{i=1}^N q(w_i,\theta),$$
where $w_i$ denotes the $i$-th observation.

Let’s denote the Hessian of $q$ by $H$,
$$H(q,\theta)_{ij}=\frac{\partial^2 q}{\partial \theta_i \partial \theta_j}$$

The asymptotic covariance of $\widehat \theta_n$ involves $E[H(q,\theta_0)]$ where $\theta_0$ is the true parameter value. One way to estimate it is to use the empirical Hesssian

$$\widehat H=\frac{1}{N}\sum_{i=1}^N H(w_i,\widehat \theta_n)$$

It is the definiteness of $\widehat H$ which is in question.

I think you’re right. Let’s distill your argument to its essence:

1. $\widehat \theta_N$ minimizes the function $Q$ defined as $Q(\theta) = {1 \over N}\sum_{i=1}^N q(w_i,\theta).$

2. Let $H$ be the Hessian of $Q$, whence $H(\theta) = \frac{\partial^2 Q}{\partial \theta_i \partial \theta_j}$ by definition and this in turn, by linearity of differentiation, equals $\frac{1}{N}\sum_{i=1}^N H(w_i, \theta_n)$.

3. Assuming $\widehat \theta_N$ lies in the interior of the domain of $Q$, then $H(\widehat \theta_N)$ must be positive semi-definite.

This is merely a statement about the function $Q$: how it is defined is merely a distraction, except insofar as the assumed second order differentiability of $q$ with respect to its second argument ($\theta$) assures the second order differentiability of $Q$.

Finding M-estimators can be tricky. Consider these data provided by @mpiktas:

{1.168042, 0.3998378}, {1.807516, 0.5939584}, {1.384942, 3.6700205}, {1.327734, -3.3390724}, {1.602101, 4.1317608}, {1.604394, -1.9045958}, {1.124633, -3.0865249}, {1.294601, -1.8331763},{1.577610, 1.0865977}, { 1.630979, 0.7869717}


The R procedure to find the M-estimator with $q((x,y),\theta)=(y-c_1x^{c_2})^4$ produced the solution $(c_1, c_2)$ = $(-114.91316, -32.54386)$. The value of the objective function (the average of the $q$’s) at this point equals 62.3542. Here is a plot of the fit:

Here is a plot of the (log) objective function in a neighborhood of this fit:

Something is fishy here: the parameters of the fit are extremely far from the parameters used to simulate the data (near $(0.3, 0.2)$) and we do not seem to be at a minimum: we are in an extremely shallow valley that is sloping towards larger values of both parameters:

The negative determinant of the Hessian at this point confirms that this is not a local minimum! Nevertheless, when you look at the z-axis labels, you can see that this function is flat to five-digit precision within the entire region, because it equals a constant 4.1329 (the logarithm of 62.354). This probably led the R function minimizer (with its default tolerances) to conclude it was near a minimum.

In fact, the solution is far from this point. To be sure of finding it, I employed the computationally expensive but highly effective “Principal Axis” method in Mathematica, using 50-digit precision (base 10) to avoid possible numerical problems. It finds a minimum near $(c_1, c_2) = (0.02506, 7.55973)$ where the objective function has the value 58.292655: about 6% smaller than the “minimum” found by R. This minimum occurs in an extremely flat-looking section, but I can make it look (just barely) like a true minimum, with elliptical contours, by exaggerating the $c_2$ direction in the plot:

The contours range from 58.29266 in the middle all the way up to 58.29284 in the corners(!). Here’s the 3D view (again of the log objective):

Here the Hessian is positive-definite: its eigenvalues are 55062.02 and 0.430978. Thus this point is a local minimum (and likely a global minimum). Here is the fit it corresponds to:

I think it’s better than the other one. The parameter values are certainly more realistic and it’s clear we’re not going to be able to do much better with this family of curves.

There are useful lessons we can draw from this example:

1. Numerical optimization can be difficult, especially with nonlinear fitting and non-quadratic loss functions. Therefore:
2. Double-check results in as many ways as possible, including:
3. Graph the objective function whenever you can.
4. When numerical results appear to violate mathematical theorems, be extremely suspicious.
5. When statistical results are surprising–such as the surprising parameter values returned by the R code–be extra suspicious.