# Should we teach kurtosis in an applied statistics course? If so, how?

Central tendency, spread and skewness can all be defined relatively well, at least on an intuitive basis; the standard mathematical measures of these things also correspond relatively well to our intuitive notions. But kurtosis seems to be different. It’s very confusing and it doesn’t match well with any intuition about distributional shape.

A typical explanation of kurtosis in an applied setting would be this extract from Applied statistics for business and management using Microsoft Excel \$^{[1]}\$:

Kurtosis refers to how peaked a distribution is or conversely how flat it is. If there are more data values in the tails, than what you expect from a normal distribution, the kurtosis is positive. Conversely if there are less data values in the tails, than you would expect in a normal distribution, the kurtosis is negative. Excel cannot calculate this statistic unless you have at least four data values.

Aside from the confusion between “kurtosis” and “excess kurtosis” (as in this book, it is common to use the former word to refer to what others author call the latter), the interpretation in terms of “peakedness” or “flatness” is then muddled by the switch of attention to how many items of data are in the tails. Considering both “peak” and “tails” is necessary — Kaplansky\$^{[2]}\$ complained in 1945 that many textbooks of the time erroneously stated kurtosis was to do with how high the peak of the distribution is compared to that of a normal distribution, without considering the tails. But clearly having to consider the shape both at the peak and in the tails makes the intuition harder to grasp, a point the extract quoted above skips over by seguing from peakedness to heaviness of tails as if these concepts were the same.

Moreover this classical “peak and tails” explanation of kurtosis only works well for symmetric and unimodal distributions (indeed, the illustrated examples in that text are all symmetric). Yet the “correct” general way to interpret kurtosis, whether in terms of “peaks”, “tails” or “shoulders”, has been disputed for decades.\$^{[2][3][4][5][6]}\$

Is there an intuitive way of teaching kurtosis in an applied setting which will not hit contradictions or counterexamples when a more rigorous approach is taken? Is kurtosis even a useful concept at all in the context of these kind of applied data analysis courses, as opposed to in mathematical statistics classes? If “peakedness” of a distribution is an intuitively useful concept, should we teach it by way of L-moments\$^{[7]}\$ instead?

\$[1]\$ Herkenhoff, L. and Fogli, J. (2013). Applied statistics for business and management using Microsoft Excel. New York, NY: Springer.

\$[2]\$ Kaplansky, I. (1945). “A common error concerning kurtosis”.
Journal of the American Statistical Association, 40(230): 259.

\$[3]\$ Darlington, Richard B (1970). “Is Kurtosis Really ‘Peakedness’?”. The American Statistician 24(2): 19–22

\$[4]\$ Moors, JJA. (1986) “The meaning of kurtosis: Darlington reexamined”. The American Statistician 40(4): 283–284

\$[5]\$ Balanda, Kevin P. and MacGillivray, H.L. (1988). “Kurtosis: A Critical Review”. The American Statistician 42(2): 111–119

\$[6]\$ DeCarlo, L. T. (1997). “On the meaning and use of kurtosis“. Psychological methods, 2(3), 292. Chicago

\$[7]\$ Hosking, J.R.M. (1992). “Moments or L moments? An example comparing two measures of distributional shape”. The American Statistician 46(3): 186–189

Kurtosis is really pretty simple … and useful. It is simply a measure of outliers, or tails. It has nothing to do with the peak whatsoever – that definition must be abandoned.

Here is a data set:
0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0, 2, 2, 3, 2, 5, 2, 3, 999

Notice that ‘999’ is an outlier.

Here are the \$z^4\$ values from the data set:

0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 360.98

Notice that only the outlier gives a \$z^4\$ that is noticeably different from 0.

The average of these \$z^4\$ values is the kurtosis of the empirical distribution (subtract 3 if you like, it doesn’t matter for the point I am making): 18.05

It should be obvious from this calculation that the data near the “peak” (the non-outlier data) contribute almost nothing to the kurtosis statistic.

Kurtosis is useful as a measure of outliers. Outliers are important to elementary students and therefore kurtosis should be taught. But kurtosis has virtually nothing to do with the peak, whether it is pointy, flat, bimodal or infinite. You can have all the above with small kurtosis and all of the above with large kurtosis. So it should NEVER be presented as having anything to do with the peak, because that will be teaching incorrect information. It also makes the material needless confusing, and seemingly less useful.

Summary:

1. kurtosis is useful as a measures of tails (outliers).
2. kurtosis has nothing to do with the peak.
3. kurtosis is practically useful and should be taught, but only as a measure of outliers. Do not mention peak when teaching kurtosis.

This article explains clearly why the “Peakedness” definition is now officially dead.

Westfall, P.H. (2014). “Kurtosis as Peakedness, 1905 – 2014. R.I.P.The American Statistician, 68(3), 191–195.