How to interpret the Delta Method?

I’m reading through https://www.statlect.com/asymptotic-theory/delta-method it defined the Delta Method as:

The delta method is a method that allows us to derive, under
appropriate conditions, the asymptotic distribution of
$g(\hat{\theta}_n)$ from the asymptotic distribution of
$\hat{\theta}$.

and one example says, in short:

A sequence of $\hat{\theta}_i$ is asymptotically normal with mean=1
and variance=1. We want to derive the asymptotic distribution of the
sequence $\hat{\theta}^2$

And the solution is:

$$\sqrt{n}(\hat{\theta}_n^2-1) \xrightarrow{D} N(0,4)$$

  1. How do I interpret this result? This doesn’t tell me the distribution of $\hat{\theta}_n^2$, instead it tells me the distribution of a shifted and scaled version of it.
  2. The steps to arrive at the solution suggest the variance of $\hat{\theta}_n^2$ is 4, and they just plugged it into the $N(0,4)$ above. If this is true, how come the variance of $\hat{\theta}_n^2$ is the variance of $\sqrt{n}(\hat{\theta}_n^2-1)$ ?

Answer

Some intuition behind the delta method:

The Delta method can be seen as combining two ideas:

  1. Continuous, differentiable functions can be approximated locally by an affine transformation.
  2. An affine transformation of a multivariate normal random variable is multivariate normal.

The 1st idea is from calculus, the 2nd is from probability. The loose intuition / argument goes:

  • The input random variable $\tilde{\boldsymbol{\theta}}_n$ is asymptotically normal (by assumption or by application of a central limit theorem in the case where $\tilde{\boldsymbol{\theta}}_n$ is a sample mean).

  • The smaller the neighborhood, the more $\mathbf{g}(\mathbf{x})$ looks like an affine transformation, that is, the more the function looks like a hyperplane (or a line in the 1 variable case).

  • Where that linear approximation applies (and some regularity conditions hold), the multivariate normality of $\tilde{\boldsymbol{\theta}}_n$ is preserved when function $\mathbf{g}$ is applied to $\tilde{\boldsymbol{\theta}}_n$.

    • Note that function $\mathbf{g}$ has to satisfy certain conditions for this to be true. Normality isn’t preserved in the neighborhood around $x=0$ for $g(x) = x^2$ because you’ll basically get both halves of the bell curve mapped to the same side: both $x=-2$ and $x=2$ get mapped to $y=4$. You need $g$ strictly increasing or decreasing in the neighborhood so that this doesn’t happen.

Idea 1: Locally, any continuous, differentiable function looks affine

An idea of calculus is if you zoom in enough on a continuous, differentiable function, it will look like a line (or hyperplane in the multivariate case). If we have some vector valued function $\mathbf{g}(\mathbf{x})$, in a small enough neighborhood around $\mathbf{c}$ you can approximate $\mathbf{g}(\mathbf{c} + \boldsymbol{\epsilon}) $ with the below affine function of $\boldsymbol{\epsilon}$:

$$ \mathbf{g}(\mathbf{c} + \boldsymbol{\epsilon}) \approx \mathbf{g}(\mathbf{c}) + \frac{\partial \mathbf{g}(\mathbf{c})}{\partial \mathbf{x}’} \;\boldsymbol{\epsilon} $$

Idea 2: An affine transformation of a multivariate normal random variable is multivariate normal

Let’s say we have $\tilde{\boldsymbol{\theta}}$ distributed multivariate normal with mean $\boldsymbol{\mu}$ and variance $V$. That is:
$$\tilde{\boldsymbol{\theta}} \sim \mathcal{N}\left( \boldsymbol{\mu}, V\right)$$

Consider a linear transformation $A$ and consider the multivariate normal random variable defined by the linear transformation $A\tilde{\boldsymbol{\theta}}$. It’s easy to show:
$$A\tilde{\boldsymbol{\theta}} – A\boldsymbol{\mu} \sim \mathcal{N}\left(\mathbf{0}, AVA’\right)$$

Putting it together:

If we know that $\tilde{\boldsymbol{\theta}} \sim \mathcal{N}\left( \boldsymbol{\mu}, V\right)$ and that function $\mathbf{g}(\mathbf{x})$ can be approximated around $\boldsymbol{\mu}$ by $\mathbf{g}(\boldsymbol{\mu}) + \frac{\partial \mathbf{g}(\boldsymbol{\mu})}{\partial \mathbf{x}’} \;\boldsymbol{\epsilon}$ then putting ideas (1) and (2) together:

$$ \mathbf{g}\left( \tilde{\boldsymbol{\theta}} \right) – \mathbf{g}(\boldsymbol{\mu}) \sim \mathcal{N} \left( \mathbf{0}, \frac{\partial \mathbf{g}(\boldsymbol{\mu})}{\partial \mathbf{x}’} V \frac{\partial \mathbf{g}(\boldsymbol{\mu})}{\partial \mathbf{x}’} ‘\right) $$

What can go wrong?

We have a problem doing this if any component of $\frac{\partial \mathbf{g}(\mathbf{c})}{\partial \mathbf{x}’}$ is zero. (eg. $g(x) = x^2$ at $x=0$.) We need $g$ strictly increasing or decreasing in the region where $\tilde{\boldsymbol{\theta}}_n$ has probability mass.

This is also going to be a bad approximation if $g$ doesn’t look like an affine function in the region where $\tilde{\boldsymbol{\theta}}_n$ has probability mass.

It may also be a bad approximation if $\tilde{\boldsymbol{\theta}}_n$ isn’t normal.

This problem:

$$g(x) = x^2 \quad \quad g'(x) = 2 x $$

If $\sqrt{n}\left( \tilde{\theta} – \mu \right) \xrightarrow{d} \mathcal{N}(0, 1)$
Applying the delta method you get…

Attribution
Source : Link , Question Author : foobar , Answer Author : Matthew Gunn

Leave a Comment