# Derivation of normalizing transform for GLMs

$\newcommand{\E}{\mathbb{E}}$How is the $A(\cdot) = \displaystyle\int\frac{du}{V^{1/3}(\mu)}$ normalizing transform for the exponential family derived?

More specifically: I tried to follow the Taylor expansion sketch on page 3, slide 1 here but have several questions. With $X$ from an exponential family, transformation $h(X)$, and $\kappa _i$ denoting the $i^{th}$ cumulant, the slides argue that:

and it remains to simply find $h(X)$ such that the above evaluates to 0.

1. My first question is about arithmetic: my Taylor expansion has different coefficients, and I can’t justify their having dropped many of the terms.

I can get to something similar by replacing the central moments by their cumulant equivalents, but it still doesn’t add up.

2. The second question: why does the analysis start with $\bar{X}$ instead of $X$, the quantity we actually care about?

The slides you link to are somewhat confusing, leaving out steps and making a few typos, but they are ultimately correct. It will help to answer question 2 first, then 1, and then finally derive the symmetrizing transformation $A(u) = \int_{-\infty}^u \frac{1}{[V(\theta)]^{1/3}} d\theta$.

Question 2. We are analyzing $\bar{X}$ as it the mean of a sample of size $N$ of i.i.d. random variables $X_1, ..., X_N$. This is an important quantity because sampling the same distribution and taking the mean happens all the time in science. We want to know how close $\bar{X}$ is to the true mean $\mu$. The Central Limit Theorem says it will converge to $\mu$ as $N \to \infty$ but we would like to know the variance and skewness of $\bar{X}$.

Question 1. Your Taylor series approximation is not incorrect, but we need to be careful about keeping track of $\bar{X}$ vs. $X_i$ and powers of $N$ to get to the same conclusion as the slides. We’ll start with the definitions of $\bar{X}$ and central moments of $X_i$ and derive the formula for $\kappa_3(h(\bar{X}))$:

$\bar{X} = \frac{1}{N}\sum_{i=1}^N X_i$

$\mathbb{E}[X_i] = \mu$

$V(X_i) = \mathbb{E}[(X_i - \mu)^2] = \sigma^2$

$\kappa_3(X_i) = \mathbb{E}[(X_i - \mu)^3]$

Now, the central moments of $\bar{X}$:

$\mathbb{E}[\bar{X}] = \frac{1}{N}\sum_{i=1}^N \mathbb{E}[X_i] = \frac{1}{N}(N\mu) = \mu$

\begin{align} V(\bar{X}) &=\mathbb{E}[(\bar{X} - \mu)^2]\\ &=\mathbb{E}[\Big((\frac{1}{N}\sum_{i=1}^N X_i) - \mu\Big)^2]\\ &=\mathbb{E}[\Big(\frac{1}{N}\sum_{i=1}^N (X_i - \mu)\Big)^2]\\ &=\frac{1}{N^2}\Big(N\mathbb{E}[(X_i - \mu)^2] + N(N-1)\mathbb{E}[X_i - \mu]\mathbb{E}[X_j - \mu]\Big)\\ &= \frac{1}{N}\sigma^2 \end{align}

The last step follows since $\mathbb{E}[X_i - \mu] = 0$, and $\mathbb{E}[(X_i - \mu)^2] = \sigma^2$. This might not have been the easiest derivation of $V(\bar{X})$, but it is the same process we need to do to find $\kappa_3(\bar{X})$ and $\kappa_3(h(\bar{X}))$, where we break up a product of a summation and count the number of terms with powers of different variables. In the above case, there were $N$ terms that were of the form $(X_i - \mu)^2$ and $N(N-1)$ terms of the form $(X_i - \mu)(X_j - \mu)$.

\begin{align} \kappa_3(\bar{X}) &= \mathbb{E}[(\bar{X}-\mu)^3)]\\ &= \mathbb{E}[\Big((\frac{1}{N}\sum_{i=1}^N X_i) - \mu\Big)^3]\\ &= \mathbb{E}[\Big(\frac{1}{N}\sum_{i=1}^N (X_i - \mu)\Big)^3]\\ &= \frac{1}{N^3}\Big(N\mathbb{E}[(X_i - \mu)^3] + 3N(N-1)\mathbb{E}[(X_i - \mu)\mathbb{E}[(X_j - \mu)^2]+N(N-1)(N-2)\mathbb{E}[(X_i - \mu)]\mathbb{E}[(X_j - \mu)]\mathbb{E}[(X_k - \mu)]\\ &= \frac{1}{N^2}\mathbb{E}[(X_i - \mu)^3]\\ &= \frac{\kappa_3(X_i)}{N^2} \end{align}

Next, we will expand $h(\bar{X})$ in a Taylor series as you have:

$h(\bar{X}) = h(\mu) + h'(\mu)(\bar{X} - \mu) + \frac{1}{2}h''(\mu)(\bar{X}-\mu)^2 + \frac{1}{3}h'''(\mu)(\bar{X}-\mu)^3 + ...$

\begin{align} \mathbb{E}[h(\bar{X})] &= h(\mu) + h'(\mu)\mathbb{E}[\bar{X} - \mu] + \frac{1}{2}h''(\mu)\mathbb{E}[(\bar{X}-\mu)^2] + \frac{1}{3}h'''(\mu)\mathbb{E}[(\bar{X}-\mu)^3] + ...\\ &= h(\mu) + \frac{1}{2}h''(\mu)\frac{\sigma^2}{N} + \frac{1}{3}h'''(\mu)\frac{\kappa_3(X_i)}{N^2} + ...\\ \end{align}

With some more effort you could prove the rest of the terms are $O(N^{-3})$. Finally, since $\kappa_3(h(\bar{X})) = \mathbb{E}[(h(\bar{X})-\mathbb{E}[h(\bar{X})])^3]$, (which is not the same as $\mathbb{E}[(h(\bar{X})-h(\mu))^3]$), we again make a similar computation:

\begin{align} \kappa_3(h(\bar{X})) &= \mathbb{E}[(h(\bar{X})-\mathbb{E}[h(\bar{X})])^3]\\ &=\mathbb{E}\Big[\Big(h(\mu) + h'(\mu)(\bar{X} - \mu) + \frac{1}{2}h''(\mu)(\bar{X}-\mu)^2 + O((\bar{X}-\mu)^3) - h(\mu) - \frac{1}{2}h''(\mu)\frac{\sigma^2}{N} - O(N^{-2})\Big)^3\Big] \end{align}

We are only interested in the terms resulting in order $O(N^{-2})$, and with extra work you could show that you do not need the terms “$O((\bar{X}-\mu)^3)$” or “$- O(N^{-2})$” before taking the third power, as they will only result in terms of order $O(N^{-3})$. So, simplifying, we get

\begin{align} \kappa_3(h(\bar{X})) &= \mathbb{E}\Big[\Big(h'(\mu)(\bar{X} - \mu) + \frac{1}{2}h''(\mu)(\bar{X}-\mu)^2 - \frac{1}{2}h''(\mu)\frac{\sigma^2}{N})\Big)^3\Big]\\ &=\mathbb{E}\Big[h'(\mu)^3(\bar{X} - \mu)^3 + \frac{1}{8}h''(\mu)^3(\bar{X}-\mu)^6 - \frac{1}{8}h''(\mu)^3\frac{\sigma^6}{N^3} + \frac{3}{2}h'(\mu)^2h''(\mu)(\bar{X}-\mu)^4 + \frac{3}{4}h'(\mu)h''(\mu)(\bar{X}-\mu)^5 - \frac{3}{2}h'(\mu)^2h''(\mu)(\bar{X} - \mu)^2\frac{\sigma^2}{N} + O(N^{-3})\Big] \end{align}

I left off some terms that were obviously $O(N^{-3})$ in this product. You’ll have to convince yourself that the terms $\mathbb{E}[(\bar{X}-\mu)^5]$ and $\mathbb{E}[(\bar{X}-\mu)^6]$ are $O(N^{-3})$ as well. However,

\begin{align} \mathbb{E}[(\bar{X}-\mu)^4] &= \mathbb{E}[\frac{1}{N^4}\Big(\sum_{i=1}^N(\bar{X}-\mu)\Big)^4]\\ &=\frac{1}{N^4}\Big(N\mathbb{E}[(X_i-\mu)^4] + 3N(N-1)\mathbb{E}[(X_i-\mu)^2]\mathbb{E}[(X_j-\mu)^2] + 0\Big)\\ &=\frac{3}{N^2}\sigma^4 + O(N^{-3}) \end{align}

Then distributing the expectation on our equation for $\kappa_3(h(\bar{X}))$, we have

\begin{align}\kappa_3(h(\bar{X})) &= h'(\mu)^3\mathbb{E}[(\bar{X} - \mu)^3] + \frac{3}{2}h'(\mu)^2h''(\mu)\mathbb{E}[(\bar{X}-\mu)^4] - \frac{3}{2}h'(\mu)^2h''(\mu)\mathbb{E}[(\bar{X} - \mu)^2]\frac{\sigma^2}{N} + O(N^{-3})\\ &= h'(\mu)^3\frac{\kappa_3(X_i)}{N^2} + \frac{9}{2}h'(\mu)^2h''(\mu)\frac{\sigma^4}{N^2} - \frac{3}{2}h'(\mu)^2h''(\mu)\frac{\sigma^4}{N^2} + O(N^{-3})\\ &=h'(\mu)^3\frac{\kappa_3(X_i)}{N^2} + 3h'(\mu)^2h''(\mu)\frac{\sigma^4}{N^2} + O(N^{-3}) \end{align}

This concludes the derivation of $\kappa_3(h(\bar{X}))$. Now, at last, we will derive the symmetrizing transform $A(u) = \int_{-\infty}^u \frac{1}{[V(\theta)]^{1/3}} d\theta$.

For this transformation, it is important that $X_i$ is from an exponential family distribution, and in particular a natural exponential family (or it has been transformed into this distribution), of the form $f_{X_i}(x;\theta) = h(x)\exp(\theta x - b(\theta))$

In this case, the cumulants of the distribution are given by $\kappa_k = b^{(k)}(\theta)$. So $\mu = b'(\theta)$, $\sigma^2 = V(\theta) = b''(\theta)$, and $\kappa_3 = b'''(\theta)$. We can write the parameter $\theta$ as a function of $\mu$ just taking the inverse of $b'$, writing $\theta(\mu) = (b')^{-1}(\mu)$. Then

$\theta'(\mu) = \frac{1}{b''((b')^{-1}(\mu))} = \frac{1}{b''(\theta))} = \frac{1}{\sigma^2}$

Next we can write the variance as a function of $\mu$, and call this function $\bar{V}$:

$\bar{V}(\mu) = V(\theta(\mu)) = b''(\theta(\mu))$

Then

$\frac{d}{d\mu}\bar{V}(\mu) = V'(\theta(\mu))\theta'(\mu) = b'''(\theta)\frac{1}{\sigma^2} = \frac{\kappa_3}{\sigma^2}$

So as a function of $\mu$, $\kappa_3(\mu) = \bar{V}'(\mu)\bar{V}(\mu)$.

Now, for the symmetrizing transformation, we want to reduce the skewness of $h(\bar{X})$ by making $h'(\mu)^3\frac{\kappa_3(X_i)}{N^2} + 3h'(\mu)^2h''(\mu)\frac{\sigma^4}{N^2} = 0$ so that $h(\bar{X})$ is $O(N^{-3})$. Thus, we want

$h'(\mu)^3\kappa_3(X_i) + 3h'(\mu)^2h''(\mu)\sigma^4 = 0$

Substituting our expressions for $\sigma^2$ and $\kappa_3$ as functions of $\mu$, we have:

$h'(\mu)^3\bar{V}'(\mu)\bar{V}(\mu) + 3h'(\mu)^2h''(\mu)\bar{V}(\mu)^2 = 0$

So $h'(\mu)^3\bar{V}'(\mu) + 3h'(\mu)^2h''(\mu)\bar{V}(\mu) = 0$, leading to $\frac{d}{d\mu}(h'(\mu)^3\bar{V}(\mu)) = 0$.

One solution to this differential equation is:

$h'(\mu)^3\bar{V}(\mu) = 1$,

$h'(\mu) = \frac{1}{[\bar{V}(\mu)]^{1/3}}$

So, $h(\mu) = \int_c^\mu \frac{1}{[\bar{V}(\theta)]^{1/3}} d\theta$, for any constant, $c$. This gives us the symmetrizing transformation $A(u) = \int_{-\infty}^u \frac{1}{[V(\theta)]^{1/3}} d\theta$, where $V$ is the variance as a function of the mean in a natural exponential family.