# Relationship between Hessian Matrix and Covariance Matrix

While I am studying Maximum Likelihood Estimation, to do inference in Maximum Likelihood Estimaion, we need to know the variance. To find out the variance, I need to know the Cramer’s Rao Lower Bound, which looks like a Hessian Matrix with Second Deriviation on the curvature. I am kind of mixed up to define the relationship between covariance matrix and hessian matrix. Hope to hear some explanations about the question. A simple example will be appreciated.

You should first check out this Basic question about Fisher Information matrix and relationship to Hessian and standard errors

Suppose we have a statistical model (family of distributions) $\{f_{\theta}: \theta \in \Theta\}$. In the most general case we have $dim(\Theta) = d$, so this family is parameterized by $\theta = (\theta_1, \dots, \theta_d)^T$. Under certain regularity conditions, we have

where $I_{i,j}$ is a Fisher Information matrix (as a function of $\theta$) and $X$ is the observed value (sample)

So Fisher Information matrix is a negated expected value of Hesian of the log-probability under some $\theta$

Now let’s say we want to estimate some vector function of the unknown parameter $\psi(\theta)$. Usually it is desired that the estimator $T(X) = (T_1(X), \dots, T_d(X))$ should be unbiased, i.e.

Cramer Rao Lower Bound states that for every unbiased $T(X)$ the $cov_{\theta}(T(X))$ satisfies

where $A \ge B$ for matrices means that $A - B$ is positive semi-definite, $\frac{\partial\psi(\theta)}{\partial\theta}$ is simply a Jacobian $J_{i,j}(\psi)$. Note that if we estimate $\theta$, that is $\psi(\theta) = \theta$, above simplifies to

But what does it tell us really? For example, recall that

and that for every positive semi-definite matrix $A$ diagonal elements are non-negative

From above we can conclude that the variance of each estimated element is bounded by diagonal elements of matrix $B(\theta)$

So CRLB doesn’t tell us the variance of our estimator, but wheter or not our estimator is optimal, i.e. if it has lowest covariance among all unbiased estimators.