# What exactly are moments? How are they derived?

We are typically introduced to method of moments estimators by “equating population moments to their sample counterpart” until we have estimated all of the population’s parameters; so that, in the case of a normal distribution, we would only need the first and second moments because they fully describe this distribution.

$E(X) = \mu \implies \sum_{i=1}^n X_i/n = \bar{X}$

$E(X^2) = \mu^2 + \sigma^2 \implies \sum_{i=1}^n X_i^2/n$

And we could theoretically compute up to $n$ additional moments as:

$E(X^r) \implies \sum_{i=1}^nX_i^r /n$

How can I build intuition for what moments really are? I know they exist as a concept in physics and in mathematics, but I find neither directly applicable, especially because I don’t know how to make the abstraction from the mass concept to a data point. The term seems to be used in a specific way in statistics, which differs from usage in other disciplines.

What characteristic of my data determines how many ($r$) moments there are overall?

It’s been a long time since I took a physics class, so let me know if any of this is incorrect.

## General description of moments with physical analogs

Take a random variable, $X$. The $n$-th moment of $X$ around $c$ is:

This corresponds exactly to the physical sense of a moment. Imagine $X$ as a collection of points along the real line with density given by the pdf. Place a fulcrum under this line at $c$ and start calculating moments relative to that fulcrum, and the calculations will correspond exactly to statistical moments.

Most of the time, the $n$-th moment of $X$ refers to the moment around 0 (moments where the fulcrum is placed at 0):

The $n$-th central moment of $X$ is:

This corresponds to moments where the fulcrum is placed at the center of mass, so the distribution is balanced. It allows moments to be more easily interpreted, as we’ll see below. The first central moment will always be zero, because the distribution is balanced.

The $n$-th standardized moment of $X$ is:

Again, this scales moments by the spread of the distribution, allowing for easier interpretation specifically of Kurtosis. The first standardized moment will always be zero, the second will always be one. This corresponds to the moment of the standard score (z-score) of a variable. I don’t have a great physical analog for this concept.

## Commonly used moments

For any distribution there are potentially an infinite number of moments. Enough moments will almost always fully characterize and distribution (deriving the necessary conditions for this to be certain is a part of the moment problem). Four moments are commonly talked about a lot in statistics:

1. Mean – the 1st moment (centered around zero). It is the center of mass of the distribution, or alternatively it’s proportional to the moment of torque of the distribution relative to a fulcrum at 0.
2. Variance – the 2nd central moment. Interpreted as representing the degree to which the distribution of $X$ is spread out. It corresponds to the moment of inertia of a distribution balanced on its fulcrum.
3. Skewness – the 3rd central moment (sometimes standardized). A measure of the skew of a distribution in one direction or another. Relative to a normal distribution (which has no skew), positively skewed distribution have a low probability of extremely high outcomes, negatively skewed distributions have a small probability of extremely low outcomes. Physical analogs are difficult, but loosely it measures the asymmetry of a distribution. As an example, the figure below is taken from Wikipedia.
4. Kurtosis – the 4th standardized moment, usually excess Kurtosis, the 4th standardized moment minus three. Kurtosis measures the extent to which $X$ places more probability on the center of the distribution relative to the tails. Higher Kurtosis means less frequent larger deviations from the mean and more frequent smaller deviations. It is often interpreted relative to the normal distribution, which has a 4th standardized moment of 3, hence an excess Kurtosis of 0. Here a physical analog is even more difficult, but in the figure below, taken from Wikipedia, the distributions with higher peaks have greater Kurtosis.

We rarely talk about moments beyond Kurtosis, precisely because there is very little intuition to them. This is similar to physicists stopping after the second moment.