While I am studying Maximum Likelihood Estimation, to do inference in Maximum Likelihood Estimaion, we need to know the variance. To find out the variance, I need to know the Cramer’s Rao Lower Bound, which looks like a Hessian Matrix with Second Deriviation on the curvature. I am kind of mixed up to define the relationship between covariance matrix and hessian matrix. Hope to hear some explanations about the question. A simple example will be appreciated.

**Answer**

You should first check out this Basic question about Fisher Information matrix and relationship to Hessian and standard errors

Suppose we have a statistical model (family of distributions) {fθ:θ∈Θ}. In the most general case we have dim(Θ)=d, so this family is parameterized by θ=(θ1,…,θd)T. Under certain regularity conditions, we have

Ii,j(θ)=−Eθ[∂2l(X;θ)∂θi∂θj]=−Eθ[Hi,j(l(X;θ))]

where Ii,j is a Fisher Information matrix (as a function of θ) and X is the observed value (sample)

l(X;θ)=ln(fθ(X)), for some θ∈Θ

So Fisher Information matrix is a **negated expected value of Hesian of the log-probability under some θ**

Now let’s say we want to estimate some vector function of the unknown parameter ψ(θ). Usually it is desired that the estimator T(X)=(T1(X),…,Td(X)) should be unbiased, i.e.

∀θ∈Θ Eθ[T(X)]=ψ(θ)

Cramer Rao Lower Bound states that for every **unbiased** T(X) the covθ(T(X)) satisfies

covθ(T(X))≥∂ψ(θ)∂θI−1(θ)(∂ψ(θ)∂θ)T=B(θ)

where A≥B for matrices means that A−B is **positive semi-definite**, ∂ψ(θ)∂θ is simply a Jacobian Ji,j(ψ). Note that if we estimate θ, that is ψ(θ)=θ, above simplifies to

covθ(T(X))≥I−1(θ)

But what does it tell us really? For example, recall that

varθ(Ti(X))=[covθ(T(X))]i,i

and that for every positive semi-definite matrix A diagonal elements are non-negative

∀i Ai,i≥0

From above we can conclude that the variance of each estimated element is bounded by diagonal elements of matrix B(θ)

∀i varθ(Ti(X))≥[B(θ)]i,i

So CRLB doesn’t tell us the variance of our estimator, but wheter or not our estimator is **optimal**, i.e. if it has lowest covariance among all unbiased estimators.

**Attribution***Source : Link , Question Author : user122358 , Answer Author : Community*