Let’s say I’m computing some model parameters my minimizing the sum squared residuals, and I’m assuming my errors are Gaussian. My model produces analytical derivatives, so the optimizer does not need to use finite differences. Once the fit is complete, I want to calculate standard errors of the fitted parameters.
Generally, in this situation, the Hessian of the error function is taken to be related to the covariance matrix by:
where σ2 is the variance of the residuals.
When no analytical derivatives of the error are available, it is typically impractical to compute the Hessian, so JTJ is taken as a good approximation.
However, in my case, I’ve got an analytical J, so it’s relatively cheap for me to compute H by finite differencing J.
So, my question is this: Would it be more accurate to approximate H using my exact J and applying the above approximation, or to approximate H by finite differencing J?
GOOD question. First, recall where this approximation H≈JTJ comes from. Let (xi,yi) be your data points, f(⋅) be your model and β be the parameters of your model. Then the objective function of the non-linear least squares problem is 12rTr where r is the vector of the residuals, ri=yi−f(xi,β). The exact Hessian of the objective function is H=JTJ+∑ri∇2ri. So the error in this approximation is H−JTJ=∑ri∇2ri. It’s a good approximation when the residuals, themselves, are small; or when the 2nd derivative of the residuals is small. Linear least squares can be considered a special case where the 2nd derivative of the residuals is zero.
As for finite difference approximation, it is relatively cheap. To compute a central difference, you’ll need to evaluate the Jacobian an additional 2n times (a forward difference will cost you n additional evaluations, so I wouldn’t bother). The error of the central difference approximation is proportional to ∇4r and h2, where h is the step size. The optimal step size is h∼ϵ13, where ϵ is machine precision. So unless the derivatives of the residuals are blowing up, it’s pretty clear that the finite difference approximation should be a LOT better. I should point out that, while the computation is minimal, the bookkeeping is nontrivial. Each finite difference on the Jacobian will give you one row of the Hessian for each residual. You’ll then have to reassemble the Hessian using the formula above.
There is, however, a 3rd option. If your solver uses a Quasi-Newton method (DFP, BFGS, Bryoden, etc.), it is already approximating the Hessian at each iteration. The approximation can be quite good, as it uses the objective function and gradient values from every iteration. Most solvers will give you access to the final Hessian estimate (or its inverse). If that’s an option for you, I would use that as the Hessian estimate. It’s already computed and it’s probably going to be a pretty good estimate.