Derivation of normalizing transform for GLMs

\newcommand{\E}{\mathbb{E}}How is the A(\cdot) = \displaystyle\int\frac{du}{V^{1/3}(\mu)} normalizing transform for the exponential family derived?

More specifically: I tried to follow the Taylor expansion sketch on page 3, slide 1 here but have several questions. With X from an exponential family, transformation h(X), and \kappa _i denoting the i^{th} cumulant, the slides argue that:

\kappa _3(h(\bar{X})) \approx h'(\mu)^3\frac{\kappa _3(\bar{X})}{N^2} + 3h'(\mu)^2h”(\mu)\frac{\sigma^4}{N} + O(N^{-3}),

and it remains to simply find h(X) such that the above evaluates to 0.

  1. My first question is about arithmetic: my Taylor expansion has different coefficients, and I can’t justify their having dropped many of the terms.

    \text{Since }h(x) &\approx h(\mu) + h'(\mu)(x – \mu) + \frac{h”(x)}{2}(x – \mu)^2\text{, we have:} \\
    h(\bar{X}) – h(u) &\approx h'(u))(\bar{X} – \mu) + \frac{h”(x)}{2}(\bar{X} – \mu)^2 \\
    \E\left(h(\bar{X}) – h(u)\right)^3 &\approx h'(\mu)^3 \E(\bar{X}-\mu)^3 + \frac{3}{2}h'(\mu)^2h”(\mu) \E(\bar{X} – \mu)^4 + \\
    &\quad \frac{3}{4}h'(\mu)h”(\mu)^2 \E(\bar{X}-\mu)^5 + \frac{1}{8}h”(\mu)^3 \E(\bar{X} – \mu)^6.

    I can get to something similar by replacing the central moments by their cumulant equivalents, but it still doesn’t add up.

  2. The second question: why does the analysis start with \bar{X} instead of X, the quantity we actually care about?


The slides you link to are somewhat confusing, leaving out steps and making a few typos, but they are ultimately correct. It will help to answer question 2 first, then 1, and then finally derive the symmetrizing transformation A(u) = \int_{-\infty}^u \frac{1}{[V(\theta)]^{1/3}} d\theta.

Question 2. We are analyzing \bar{X} as it the mean of a sample of size N of i.i.d. random variables X_1, …, X_N. This is an important quantity because sampling the same distribution and taking the mean happens all the time in science. We want to know how close \bar{X} is to the true mean \mu. The Central Limit Theorem says it will converge to \mu as N \to \infty but we would like to know the variance and skewness of \bar{X}.

Question 1. Your Taylor series approximation is not incorrect, but we need to be careful about keeping track of \bar{X} vs. X_i and powers of N to get to the same conclusion as the slides. We’ll start with the definitions of \bar{X} and central moments of X_i and derive the formula for \kappa_3(h(\bar{X})):

\bar{X} = \frac{1}{N}\sum_{i=1}^N X_i

\mathbb{E}[X_i] = \mu

V(X_i) = \mathbb{E}[(X_i – \mu)^2] = \sigma^2

\kappa_3(X_i) = \mathbb{E}[(X_i – \mu)^3]

Now, the central moments of \bar{X}:

\mathbb{E}[\bar{X}] = \frac{1}{N}\sum_{i=1}^N \mathbb{E}[X_i] = \frac{1}{N}(N\mu) = \mu

V(\bar{X}) &=\mathbb{E}[(\bar{X} – \mu)^2]\\
&=\mathbb{E}[\Big((\frac{1}{N}\sum_{i=1}^N X_i) – \mu\Big)^2]\\
&=\mathbb{E}[\Big(\frac{1}{N}\sum_{i=1}^N (X_i – \mu)\Big)^2]\\
&=\frac{1}{N^2}\Big(N\mathbb{E}[(X_i – \mu)^2] + N(N-1)\mathbb{E}[X_i – \mu]\mathbb{E}[X_j – \mu]\Big)\\
&= \frac{1}{N}\sigma^2

The last step follows since \mathbb{E}[X_i – \mu] = 0, and \mathbb{E}[(X_i – \mu)^2] = \sigma^2. This might not have been the easiest derivation of V(\bar{X}), but it is the same process we need to do to find \kappa_3(\bar{X}) and \kappa_3(h(\bar{X})), where we break up a product of a summation and count the number of terms with powers of different variables. In the above case, there were N terms that were of the form (X_i – \mu)^2 and N(N-1) terms of the form (X_i – \mu)(X_j – \mu).

\kappa_3(\bar{X}) &= \mathbb{E}[(\bar{X}-\mu)^3)]\\
&= \mathbb{E}[\Big((\frac{1}{N}\sum_{i=1}^N X_i) – \mu\Big)^3]\\
&= \mathbb{E}[\Big(\frac{1}{N}\sum_{i=1}^N (X_i – \mu)\Big)^3]\\
&= \frac{1}{N^3}\Big(N\mathbb{E}[(X_i – \mu)^3] + 3N(N-1)\mathbb{E}[(X_i – \mu)\mathbb{E}[(X_j – \mu)^2]+N(N-1)(N-2)\mathbb{E}[(X_i – \mu)]\mathbb{E}[(X_j – \mu)]\mathbb{E}[(X_k – \mu)]\\
&= \frac{1}{N^2}\mathbb{E}[(X_i – \mu)^3]\\
&= \frac{\kappa_3(X_i)}{N^2}

Next, we will expand h(\bar{X}) in a Taylor series as you have:

h(\bar{X}) = h(\mu) + h'(\mu)(\bar{X} – \mu) + \frac{1}{2}h”(\mu)(\bar{X}-\mu)^2 + \frac{1}{3}h”'(\mu)(\bar{X}-\mu)^3 + …

\mathbb{E}[h(\bar{X})] &= h(\mu) + h'(\mu)\mathbb{E}[\bar{X} – \mu] + \frac{1}{2}h”(\mu)\mathbb{E}[(\bar{X}-\mu)^2] + \frac{1}{3}h”'(\mu)\mathbb{E}[(\bar{X}-\mu)^3] + …\\
&= h(\mu) + \frac{1}{2}h”(\mu)\frac{\sigma^2}{N} + \frac{1}{3}h”'(\mu)\frac{\kappa_3(X_i)}{N^2} + …\\

With some more effort you could prove the rest of the terms are O(N^{-3}). Finally, since \kappa_3(h(\bar{X})) = \mathbb{E}[(h(\bar{X})-\mathbb{E}[h(\bar{X})])^3], (which is not the same as \mathbb{E}[(h(\bar{X})-h(\mu))^3]), we again make a similar computation:

\kappa_3(h(\bar{X})) &= \mathbb{E}[(h(\bar{X})-\mathbb{E}[h(\bar{X})])^3]\\
&=\mathbb{E}\Big[\Big(h(\mu) + h'(\mu)(\bar{X} – \mu) + \frac{1}{2}h”(\mu)(\bar{X}-\mu)^2 + O((\bar{X}-\mu)^3) – h(\mu) – \frac{1}{2}h”(\mu)\frac{\sigma^2}{N} – O(N^{-2})\Big)^3\Big]

We are only interested in the terms resulting in order O(N^{-2}), and with extra work you could show that you do not need the terms “O((\bar{X}-\mu)^3)” or “– O(N^{-2})” before taking the third power, as they will only result in terms of order O(N^{-3}). So, simplifying, we get

\kappa_3(h(\bar{X})) &= \mathbb{E}\Big[\Big(h'(\mu)(\bar{X} – \mu) + \frac{1}{2}h”(\mu)(\bar{X}-\mu)^2 – \frac{1}{2}h”(\mu)\frac{\sigma^2}{N})\Big)^3\Big]\\
&=\mathbb{E}\Big[h'(\mu)^3(\bar{X} – \mu)^3 + \frac{1}{8}h”(\mu)^3(\bar{X}-\mu)^6 – \frac{1}{8}h”(\mu)^3\frac{\sigma^6}{N^3} + \frac{3}{2}h'(\mu)^2h”(\mu)(\bar{X}-\mu)^4 + \frac{3}{4}h'(\mu)h”(\mu)(\bar{X}-\mu)^5 – \frac{3}{2}h'(\mu)^2h”(\mu)(\bar{X} – \mu)^2\frac{\sigma^2}{N} + O(N^{-3})\Big]

I left off some terms that were obviously O(N^{-3}) in this product. You’ll have to convince yourself that the terms \mathbb{E}[(\bar{X}-\mu)^5] and \mathbb{E}[(\bar{X}-\mu)^6] are O(N^{-3}) as well. However,

\mathbb{E}[(\bar{X}-\mu)^4] &= \mathbb{E}[\frac{1}{N^4}\Big(\sum_{i=1}^N(\bar{X}-\mu)\Big)^4]\\
&=\frac{1}{N^4}\Big(N\mathbb{E}[(X_i-\mu)^4] + 3N(N-1)\mathbb{E}[(X_i-\mu)^2]\mathbb{E}[(X_j-\mu)^2] + 0\Big)\\
&=\frac{3}{N^2}\sigma^4 + O(N^{-3})

Then distributing the expectation on our equation for \kappa_3(h(\bar{X})), we have

\begin{align}\kappa_3(h(\bar{X})) &= h'(\mu)^3\mathbb{E}[(\bar{X} – \mu)^3] + \frac{3}{2}h'(\mu)^2h”(\mu)\mathbb{E}[(\bar{X}-\mu)^4] – \frac{3}{2}h'(\mu)^2h”(\mu)\mathbb{E}[(\bar{X} – \mu)^2]\frac{\sigma^2}{N} + O(N^{-3})\\
&= h'(\mu)^3\frac{\kappa_3(X_i)}{N^2} + \frac{9}{2}h'(\mu)^2h”(\mu)\frac{\sigma^4}{N^2} – \frac{3}{2}h'(\mu)^2h”(\mu)\frac{\sigma^4}{N^2} + O(N^{-3})\\
&=h'(\mu)^3\frac{\kappa_3(X_i)}{N^2} + 3h'(\mu)^2h”(\mu)\frac{\sigma^4}{N^2} + O(N^{-3})

This concludes the derivation of \kappa_3(h(\bar{X})). Now, at last, we will derive the symmetrizing transform A(u) = \int_{-\infty}^u \frac{1}{[V(\theta)]^{1/3}} d\theta.

For this transformation, it is important that X_i is from an exponential family distribution, and in particular a natural exponential family (or it has been transformed into this distribution), of the form f_{X_i}(x;\theta) = h(x)\exp(\theta x – b(\theta))

In this case, the cumulants of the distribution are given by \kappa_k = b^{(k)}(\theta). So \mu = b'(\theta), \sigma^2 = V(\theta) = b”(\theta), and \kappa_3 = b”'(\theta). We can write the parameter \theta as a function of \mu just taking the inverse of b’, writing \theta(\mu) = (b’)^{-1}(\mu). Then

\theta'(\mu) = \frac{1}{b”((b’)^{-1}(\mu))} = \frac{1}{b”(\theta))} = \frac{1}{\sigma^2}

Next we can write the variance as a function of \mu, and call this function \bar{V}:

\bar{V}(\mu) = V(\theta(\mu)) = b”(\theta(\mu))


\frac{d}{d\mu}\bar{V}(\mu) = V'(\theta(\mu))\theta'(\mu) = b”'(\theta)\frac{1}{\sigma^2} = \frac{\kappa_3}{\sigma^2}

So as a function of \mu, \kappa_3(\mu) = \bar{V}'(\mu)\bar{V}(\mu).

Now, for the symmetrizing transformation, we want to reduce the skewness of h(\bar{X}) by making h'(\mu)^3\frac{\kappa_3(X_i)}{N^2} + 3h'(\mu)^2h”(\mu)\frac{\sigma^4}{N^2} = 0 so that h(\bar{X}) is O(N^{-3}). Thus, we want

h'(\mu)^3\kappa_3(X_i) + 3h'(\mu)^2h”(\mu)\sigma^4 = 0

Substituting our expressions for \sigma^2 and \kappa_3 as functions of \mu, we have:

h'(\mu)^3\bar{V}'(\mu)\bar{V}(\mu) + 3h'(\mu)^2h”(\mu)\bar{V}(\mu)^2 = 0

So h'(\mu)^3\bar{V}'(\mu) + 3h'(\mu)^2h”(\mu)\bar{V}(\mu) = 0, leading to \frac{d}{d\mu}(h'(\mu)^3\bar{V}(\mu)) = 0.

One solution to this differential equation is:

h'(\mu)^3\bar{V}(\mu) = 1,

h'(\mu) = \frac{1}{[\bar{V}(\mu)]^{1/3}}

So, h(\mu) = \int_c^\mu \frac{1}{[\bar{V}(\theta)]^{1/3}} d\theta, for any constant, c. This gives us the symmetrizing transformation A(u) = \int_{-\infty}^u \frac{1}{[V(\theta)]^{1/3}} d\theta, where V is the variance as a function of the mean in a natural exponential family.

Source : Link , Question Author : AlexK , Answer Author : Jonathan Hahn

Leave a Comment