Relationship between Hessian Matrix and Covariance Matrix

While I am studying Maximum Likelihood Estimation, to do inference in Maximum Likelihood Estimaion, we need to know the variance. To find out the variance, I need to know the Cramer’s Rao Lower Bound, which looks like a Hessian Matrix with Second Deriviation on the curvature. I am kind of mixed up to define the relationship between covariance matrix and hessian matrix. Hope to hear some explanations about the question. A simple example will be appreciated.

Answer

You should first check out this Basic question about Fisher Information matrix and relationship to Hessian and standard errors

Suppose we have a statistical model (family of distributions) {fθ:θΘ}. In the most general case we have dim(Θ)=d, so this family is parameterized by θ=(θ1,,θd)T. Under certain regularity conditions, we have

Ii,j(θ)=Eθ[2l(X;θ)θiθj]=Eθ[Hi,j(l(X;θ))]

where Ii,j is a Fisher Information matrix (as a function of θ) and X is the observed value (sample)

l(X;θ)=ln(fθ(X)), for some θΘ

So Fisher Information matrix is a negated expected value of Hesian of the log-probability under some θ

Now let’s say we want to estimate some vector function of the unknown parameter ψ(θ). Usually it is desired that the estimator T(X)=(T1(X),,Td(X)) should be unbiased, i.e.

θΘ Eθ[T(X)]=ψ(θ)

Cramer Rao Lower Bound states that for every unbiased T(X) the covθ(T(X)) satisfies

covθ(T(X))ψ(θ)θI1(θ)(ψ(θ)θ)T=B(θ)

where AB for matrices means that AB is positive semi-definite, ψ(θ)θ is simply a Jacobian Ji,j(ψ). Note that if we estimate θ, that is ψ(θ)=θ, above simplifies to

covθ(T(X))I1(θ)

But what does it tell us really? For example, recall that

varθ(Ti(X))=[covθ(T(X))]i,i

and that for every positive semi-definite matrix A diagonal elements are non-negative

i Ai,i0

From above we can conclude that the variance of each estimated element is bounded by diagonal elements of matrix B(θ)

i varθ(Ti(X))[B(θ)]i,i

So CRLB doesn’t tell us the variance of our estimator, but wheter or not our estimator is optimal, i.e. if it has lowest covariance among all unbiased estimators.

Attribution
Source : Link , Question Author : user122358 , Answer Author : Community

Leave a Comment