# What distribution to use to model time before a train arrives?

I’m trying to model some data on train arrival times. I’d like to use a distribution that captures “the longer I wait, the more likely the train is going to show up”. It seems like such a distribution should look like a CDF, so that P(train show up | waited 60 minutes) is close to 1. What distribution is appropriate to use here?

### Multiplication of two probabilities

The probability for a first arrival at a time between $$tt$$ and $$t+dtt+dt$$ (the waiting time) is equal to the multiplication of

• the probability for an arrival between $$tt$$ and $$t+dtt+dt$$ (which can be related to the arrival rate $$s(t)s(t)$$ at time $$tt$$)
• and the probability of no arrival before time $$tt$$ (or otherwise it
would not be the first).

This latter term is related to:

$$P(n=0,t+dt)=(1−s(t)dt)P(n=0,t)P(n=0,t+dt) = (1-s(t)dt) P(n=0,t)$$

or

$$∂P(n=0,t)∂t=−s(t)P(n=0,t)\frac{\partial P(n=0,t)}{\partial t} = -s(t) P(n=0,t)$$

giving:

$$P(n=0,t)=e∫t0−s(t)dtP(n=0,t) = e^{\int_0^t-s(t) dt}$$

and probability distribution for waiting times is:

$$f(t)=s(t)e∫t0−s(t)dtf(t) = s(t)e^{\int_0^t-s(t) dt}$$

### Derivation of cumulative distribution.

Alternatively you could use the expression for the probability of less than one arrival conditional that the time is $$tt$$

$$P(n<1|t)=F(n=0;t)P(n<1|t) = F(n=0;t)$$

and the probability for arrival between time $$tt$$ and $$t+dtt+dt$$ is equal to the derivative

$$farrival time(t)=−ddtF(n=0|t)f_{\text{arrival time}}(t) = - \frac{d}{d t} F(n=0 \vert t)$$

This approach/method is for instance useful in deriving the gamma distribution as the waiting time for the n-th arrival in a Poisson process. (waiting-time-of-poisson-process-follows-gamma-distribution)

### Two examples

• Exponential distribution: If the arrivals are random like a Poisson process then $$s(t)=λs(t) = \lambda$$ is constant. The probability of a next arrival is independent from the previous waiting time without arrival (say, if you roll a fair dice many times without six, then for the next roll you will not suddenly have a higher probability for a six, see gambler's fallacy). You will get the exponential distribution, and the pdf for the waiting times is: $$f(t)=λe−λtf(t) = \lambda e^{-\lambda t}$$

• Constant distribution: If the arrivals are occurring at a constant rate (such as trains arriving according to a fixed schedule), then the probability of an arrival, when a person has already been waiting for some time, is increasing. Say a train is supposed to arrive every $$TT$$ minutes then the frequency, after already waiting $$tt$$ minutes is $$s(t)=1/(T−t)s(t) = 1/(T-t)$$ and the pdf for the waiting time will be: $$f(t)=e∫t0−1T−tdtT−t=1Tf(t)= \frac{e^{\int_0^t -\frac{1}{T-t} dt}}{T-t} = \frac{1}{T}$$ which makes sense since every time between $$00$$ and $$TT$$ should have equal probability to be the first arrival.

So it is this second case, with "then the probability of an arrival, when a person has already been waiting for some time is increasing", that relates to your question.

It might need some adjustments depending on your situation. With more information the probability $$s(t)dts(t) dt$$ for a train to arrive at a certain moment might be a more complex function.