# How odd is a cluster of plane accidents?

Original question (7/25/14):
Does this quotation from the news media make sense, or is there a better statistical way of viewing the spate of recent plane accidents?

However, Barnett also draws attention to the theory of Poisson distribution, which implies that short intervals between crashes are actually more probable than long ones.

“Suppose that there is an average of one fatal accident per year, meaning that the chance of a crash on any given day is one in 365,” says Barnett. “If there is a crash on 1 August, the chance that the next crash occurs one day later on 2 August is 1/365. But the chance the next crash is on 3 August is (364/365) x (1/365), because the next crash occurs on 3 August only if there is no crash on 2 August.”

“It seems counterintuitive, but the conclusion follows relentlessly from the laws of probability,” Barnett says.

Clarification (7/27/14):
What is counter intuitive (to me) is saying that rare events tend to occur close in time. Intuitively, I would think that rare events would not occur close in time. Can anyone point me to a theoretical or empirical expected distribution of the time between events under the assumptions of a Poisson distribution? (That is, a histogram where the y-axis is frequency or probability and the x-axis is time between 2 consecutive occurrences grouped into days, weeks, months, or years, or the like.) Thanks.

Clarification (7/28/14):
The headline implies it is more likely to have clusters of accidents than widely spaced accidents. Lets operationalize that. Let’s say that a cluster is 3 airplane accidents, and a short period of time is 3 months and a long period of time is 3 years. It seems illogical to think that there is a higher probability that 3 accidents will occur within a period of 3 months than within a period of 3 years. Even if we take the first accident as a given, it is illogical to think that 2 more accidents will occur within the next 3 months as compared to within the next 3 years. If that is true, then the news media headline is misleading and incorrect. Am I missing something?

Summary: The first sentence in the quoted BBC paragraph is sloppy and misleading.

So let us assume that a probability of a plane crash on any given day is $p=1/365$ and that the crashes are independent from each other. Let us further assume that one plane crashed on January 1st. When would the next plane crash?

Well, let us do a simple simulation: for each day for the next three years I will randomly decide if another plane crashed with probability $p$ and note the day of the next crash; I will repeat this procedure $100\,000$ times. Here is the resulting histogram:

In fact, the probability distribution is simply given by $\mathrm{Pr}(t) = (1-p)^t p$, where $t$ is the number of days. I plotted this theoretical distribution as a red line, and you can see that it fits well to the Monte Carlo histogram. Remark: if time were discretized in smaller and smaller bins, this distributions would converge to an exponential one; but it does not really matter for this discussion.

As many people have already remarked here, it is a decreasing curve. This means that the probability that the next plane crashes on the next day, January 2nd, is higher than the probability that the next plane will crash on any other given day, e.g. on January 2nd next year (the difference is almost three-fold: $0.27\%$ and $0.10\%$).

However, if you ask what is the probability that the next plane crashes in the next three days, the answer is $0.8\%$, but if you ask what is the probability that it will crash after three days, but in the next three years, then the answer is $94\%$. So, obviously, it is more likely that it will crash in the next three years (but after the first three days) than in the next three days. The confusion arises because when you say “clustered events” you refer to a very small initial chunk of the distribution, but when you say “widely spaced” events you refer to a large chunk of it. That is why even with a monotonically decreasing probability distribution it is surely possible that “clusters” (e.g. two plane crashes in three days) are very unlikely.

Here is another histogram to really get this point across. It is simply a sum of the previous histogram over several non-intersecting time periods: