# How can a probability distribution diverge?

How can for instance the Gamma distribution diverge near zero (for an appropriate set of scale and shape parameters, say shape $=0.1$ and scale $=10$), and still have its area equal to one?

As I understand it, the area of a probability density distribution should always be equal to one. If you take the dirac delta distribution, which diverges at zero but is zero anywhere else, you have an area equal to one.

Somehow, if you would take the area of a diverging Gamma distribution, you could express it as the area of a dirac delta distribution, plus something more since it has non zero weight at $x\neq0$, so it would be bigger than one.

Can someone explain me where my reasoning goes wrong?

Somehow, if you would take the area of a diverging Gamma distribution, you could express it as the area of a dirac delta distribution, plus something more since it has non zero weight at $x \neq 0$, so it would be bigger than one.

That’s where your reasoning goes wrong: you can’t automatically express any function which is infinite at $x = 0$ as a delta distribution plus something more. After all, if you could do this with $\delta(x)$, who’s to say you couldn’t also do it with $2\delta(x)$? Or $10^{-10}\delta(x)$? Or any other coefficient? It’s just as valid to say that those distributions are zero for $x\neq 0$ and infinite at $x = 0$; why not use the same reasoning with them?

Actually, distributions (in the mathematical sense of distribution theory) should be thought of more like functions of functions – you put in a function and get out a number. For the delta distribution specifically, if you put in the function $f$, you get out the number $f(0)$. Distributions are not normal number-to-number functions. They’re more complicated, and more capable, than such “ordinary” functions.

This idea of turning a function into a number is quite familiar to anyone who’s used to dealing with probability. For example, the series of distribution moments – mean, standard deviation, skewness, kurtosis, and so on – can all be thought of as rules that turn a function (the probability distribution) into a number (the corresponding moment). Take the mean/expectation value, for instance. This rule turns a probability distribution $P(x)$ into the number $E_P[x]$, calculated as

Or the rule for variance turns $P(x)$ into the number $\sigma_P^2$, where

My notation is a little weird here, but hopefully you get the idea.1

You may notice something these rules have in common: in all of them, the way you get from the function to the number is by integrating the function times some other weighting function. This is a very common way to represent mathematical distributions. So it’s natural to wonder, is there some weighting function $\delta(x)$ that allows you to represent the action of a delta distribution like this?

You can easily establish that if there is such a function, it has to be equal to $0$ at every $x\neq 0$. But you can’t get a value for $\delta(0)$ in this way. You can show that it’s larger than any finite number, but there is no actual value for $\delta(0)$ that makes this equation work out, using the standard ideas of integration.2

The reason for that is that there’s more to the delta distribution than just this:

That “$\infty$” is misleading. It stands in for a whole extra set of information about the delta distribution that normal functions just can’t represent. And that’s why you can’t meaningfully say that the gamma distribution is “more” than the delta distribution. Sure, at any $x > 0$, the value of the gamma distribution is more than the value of the delta distribution, but all the useful information about the delta distribution is locked up in that point at $x = 0$, and that information is too rich and complex to allow you to say that one distribution is more than the other.

# Technical details

1Actually, you can flip things around and think of the probability distribution itself as the mathematical distribution. In this sense, the probability distribution is a rule that takes a weighting function, like $x$ or $(x - E[x])^2$, to a number, $E[x]$ or $\sigma_x^2$ respectively. If you think about it that way, the standard notation makes a bit more sense, but I think the overall idea is a bit less natural for a post about mathematical distributions.

2Specifically, by “standard ideas of integration” I’m taking about Riemann integration and Lebesgue integration, both of which have the property that two functions which differ only at a single point must have the same integral (given the same limits). If there were a function $\delta(x)$, it would differ from the function $0$ at only one point, namely $x = 0$, and thus the two functions’ integrals would always have to be the same.

So there is no number you can assign to $\delta(0)$ that makes it reproduce the effect of the delta distribution.