I’m reading through https://www.statlect.com/asymptotictheory/deltamethod it defined the Delta Method as:
The delta method is a method that allows us to derive, under
appropriate conditions, the asymptotic distribution of
$g(\hat{\theta}_n)$ from the asymptotic distribution of
$\hat{\theta}$.and one example says, in short:
A sequence of $\hat{\theta}_i$ is asymptotically normal with mean=1
and variance=1. We want to derive the asymptotic distribution of the
sequence $\hat{\theta}^2$And the solution is:
$$\sqrt{n}(\hat{\theta}_n^21) \xrightarrow{D} N(0,4)$$
 How do I interpret this result? This doesn’t tell me the distribution of $\hat{\theta}_n^2$, instead it tells me the distribution of a shifted and scaled version of it.
 The steps to arrive at the solution suggest the variance of $\hat{\theta}_n^2$ is 4, and they just plugged it into the $N(0,4)$ above. If this is true, how come the variance of $\hat{\theta}_n^2$ is the variance of $\sqrt{n}(\hat{\theta}_n^21)$ ?
Answer
Some intuition behind the delta method:
The Delta method can be seen as combining two ideas:
 Continuous, differentiable functions can be approximated locally by an affine transformation.
 An affine transformation of a multivariate normal random variable is multivariate normal.
The 1st idea is from calculus, the 2nd is from probability. The loose intuition / argument goes:

The input random variable $\tilde{\boldsymbol{\theta}}_n$ is asymptotically normal (by assumption or by application of a central limit theorem in the case where $\tilde{\boldsymbol{\theta}}_n$ is a sample mean).

The smaller the neighborhood, the more $\mathbf{g}(\mathbf{x})$ looks like an affine transformation, that is, the more the function looks like a hyperplane (or a line in the 1 variable case).

Where that linear approximation applies (and some regularity conditions hold), the multivariate normality of $\tilde{\boldsymbol{\theta}}_n$ is preserved when function $\mathbf{g}$ is applied to $\tilde{\boldsymbol{\theta}}_n$.
 Note that function $\mathbf{g}$ has to satisfy certain conditions for this to be true. Normality isn’t preserved in the neighborhood around $x=0$ for $g(x) = x^2$ because you’ll basically get both halves of the bell curve mapped to the same side: both $x=2$ and $x=2$ get mapped to $y=4$. You need $g$ strictly increasing or decreasing in the neighborhood so that this doesn’t happen.
Idea 1: Locally, any continuous, differentiable function looks affine
An idea of calculus is if you zoom in enough on a continuous, differentiable function, it will look like a line (or hyperplane in the multivariate case). If we have some vector valued function $\mathbf{g}(\mathbf{x})$, in a small enough neighborhood around $\mathbf{c}$ you can approximate $\mathbf{g}(\mathbf{c} + \boldsymbol{\epsilon}) $ with the below affine function of $\boldsymbol{\epsilon}$:
$$ \mathbf{g}(\mathbf{c} + \boldsymbol{\epsilon}) \approx \mathbf{g}(\mathbf{c}) + \frac{\partial \mathbf{g}(\mathbf{c})}{\partial \mathbf{x}’} \;\boldsymbol{\epsilon} $$
Idea 2: An affine transformation of a multivariate normal random variable is multivariate normal
Let’s say we have $\tilde{\boldsymbol{\theta}}$ distributed multivariate normal with mean $\boldsymbol{\mu}$ and variance $V$. That is:
$$\tilde{\boldsymbol{\theta}} \sim \mathcal{N}\left( \boldsymbol{\mu}, V\right)$$
Consider a linear transformation $A$ and consider the multivariate normal random variable defined by the linear transformation $A\tilde{\boldsymbol{\theta}}$. It’s easy to show:
$$A\tilde{\boldsymbol{\theta}} – A\boldsymbol{\mu} \sim \mathcal{N}\left(\mathbf{0}, AVA’\right)$$
Putting it together:
If we know that $\tilde{\boldsymbol{\theta}} \sim \mathcal{N}\left( \boldsymbol{\mu}, V\right)$ and that function $\mathbf{g}(\mathbf{x})$ can be approximated around $\boldsymbol{\mu}$ by $\mathbf{g}(\boldsymbol{\mu}) + \frac{\partial \mathbf{g}(\boldsymbol{\mu})}{\partial \mathbf{x}’} \;\boldsymbol{\epsilon}$ then putting ideas (1) and (2) together:
$$ \mathbf{g}\left( \tilde{\boldsymbol{\theta}} \right) – \mathbf{g}(\boldsymbol{\mu}) \sim \mathcal{N} \left( \mathbf{0}, \frac{\partial \mathbf{g}(\boldsymbol{\mu})}{\partial \mathbf{x}’} V \frac{\partial \mathbf{g}(\boldsymbol{\mu})}{\partial \mathbf{x}’} ‘\right) $$
What can go wrong?
We have a problem doing this if any component of $\frac{\partial \mathbf{g}(\mathbf{c})}{\partial \mathbf{x}’}$ is zero. (eg. $g(x) = x^2$ at $x=0$.) We need $g$ strictly increasing or decreasing in the region where $\tilde{\boldsymbol{\theta}}_n$ has probability mass.
This is also going to be a bad approximation if $g$ doesn’t look like an affine function in the region where $\tilde{\boldsymbol{\theta}}_n$ has probability mass.
It may also be a bad approximation if $\tilde{\boldsymbol{\theta}}_n$ isn’t normal.
This problem:
$$g(x) = x^2 \quad \quad g'(x) = 2 x $$
If $\sqrt{n}\left( \tilde{\theta} – \mu \right) \xrightarrow{d} \mathcal{N}(0, 1)$
Applying the delta method you get…
Attribution
Source : Link , Question Author : foobar , Answer Author : Matthew Gunn