If XX is normally distributed, can \log(X)\log(X) also be normally distributed?

Suppose X is distributed N(\mu, \sigma^2) where \mu \neq 0.
Can I use the Delta Method to say that log(X) ~ N(log(\mu), \sigma^2/\mu^2)?

Answer

It is not the case.

For \log(X) to be normal, X must be lognormal.

(Consider: if Z=\log(X) is normal, then X=\exp(Z) … and when you exponentiate a normal random variable, what you get is called a lognormal random variable.)

More generally, taking logs “pulls in” more extreme values on the right (high values) relative to the median, while values at the far left (low values) tend to get stretched back. So if it’s symmetric before taking logs, it will be relatively left skew after. This is a simple consequence of the shape of the function \log(x):

enter image description here

(the line is tangent to the curve. In general it doesn’t necessarily go close to the origin, that’s just an artifact of the particular value of m in this case)

Values very close to the median (indicated by an m in the plot) will experience an approximately linear rescaling (the dashed blue line). Values far above m will be pulled back toward m relative to that rescaling experienced by the middle values, while values far below m will be pulled further away from m, relative to that linear rescaling.

As a result, values at an equal distance, d above and below m before transformation will not be equally distant from it afterward – the transformed value above will be closer to \log(m) than the transformed value below it will be. This would happen for every value of d.

So symmetric X implies asymmetric \log(X).


Now let’s talk not about normality, but approximate normality. (For simplicity let’s assume that the distribution is such that the values are going to be essentially always positive – i.e. if the original values were normal, the chance of a negative value is extremely low.)

There is one situation where approximately normal values tend to still be approximately normal after transformation.

That’s when the standard deviation is very small compared to the mean (low coefficient of variation).

If you look at the above diagram, consider values on the x-axis in a very narrow band around m. The pulling-in/stretching-out effect is minimal (the black curve doesn’t have room to move far away from the blue tangent line), and so the shape still looks normal.

Here’s an example: the top plot is a set of approximately normal data (the Q-Q plot shows a fairly straight line), and its log is also approximately normal (the Q-Q plot still shows a fairly straight line). That’s because the coefficient of variation in the original values was pretty small (somewhere around 0.2 I think) – the nonlinear transformation was still nearly linear in the narrow range of values around the middle.

enter image description here

In this situation, the delta method does indeed tend to be useful at giving approximate values for the mean and variance of the log-values, though it would not actually be the distribution of the log of an exactly normal random variate.

Attribution
Source : Link , Question Author : JCWong , Answer Author : Glen_b

Leave a Comment