I have a database of events (i.e. a variable of dates) and associated covariates.
The events are generated by the non-stationary Poisson process with parameter being an unknown (but possibly linear) function of some covariates.
I think the NHPoisson package exists just for this purpose; but after 15 hours of unsuccessful research I am still nowhere near getting to know how to use it.
Heck, I even tried reading both referenced books:
Coles, S. (2001). An introduction to statistical modelling of extreme values. Springer.
Casella, G. and Berger, R.L., (2002). Statistical inference. Brooks/Cole.
The one single example in the documentation of fitPP.fun doesn’t seem to fit my setup; I don’t have extreme values! I just have bare events.
Can anyone please help me with a simple example of fitting a Poisson process with parameter λ with single covariate X, and assumption, that λ=λ0+α⋅X? I’m interested in estimation of λ0 and α. I provide a two-column data set with times of events (let say, measured in seconds after some arbitrary time t0) and another column with values of the covariate X?
Fitting a stationary Poisson process
First of all it is important to realize, what sort of input data NHPoisson needs.
Foremost, NHPoisson needs a list of indices of event moments. If we record time interval and number of events in the time interval, than I we must translate it into a single column of dates, possibly “smearing” the dates over the interval they are recorded on.
For the simplicity I’ll assume, that we use time measured in seconds, and that the “second” is the natural unit of λ.
Let’s simulate data for a simple, stationary Poisson process, which has λ=1 events per minute:
lambda=1/60 #1 event per minute time.span=60*60*24 #24 hours, with time granularity one second aux<-simNHP.fun(rep(lambda,time.span))
simNHP.fun makes the simulation. We use to get
aux$posNH, a variable with indices of moments of simulated event firing. We can see that we roughly have 60*24 = 1440 events, by checking `length(aux$posNH).
Now let’s reverse-engineer the λ with
out<-fitPP.fun(posE=aux$posNH,n=time.span,start=list(b0=0)) # b0=0 is our guess at initial value for optimization, which is internally made with `nlminb` function
Because the function only takes events’ indices, it needs also a measure of how many possible indices are possible. And this is a very confusing part, because in the true Poisson process it is possible to have infinite number of possible events (if only λ>0). But from the perspective of the
fitPP we need to choose some small enough time unit. We choose it so small, that we can assume maximum one event per unit of time.
So what we do in fact is that we approximate the Poisson process with granular sequence of binomial events, each event spans exactly one unit of time, in analogy to the mechanism in which Poisson distribution can be seen as a limit of binomial distribution in the law of rare events.
Once we understand it, the rest is much simpler (at least for me).
To get the approximation of our λ me need to take exponent of the fitted parameter
exp(coef(out)). And here comes another important piece of information, that is missing in the manual: with
NHPoisson we fit to the logarithm of λ, not to λ itself.
Fitting a non-stationary Poisson process
NHPoisson fits the following model:
i.e. it fits linear combination of parameters →P (called covariates) to the logarithm of the λ.
Now let’s prepare non-stationary Poisson process.
time.span=60*60*24 #24 hours, with time granularity one second all.seconds<-seq(1,time.span,length.out=time.span) lambdas=0.05*exp(-0.0001*all.seconds) #we can't model a linear regression with NHPoisson. It must have the form with exp. aux<-simNHP.fun(lambdas)
Just as before,
aux$posNH would give us indices of events, but this time we will notice, that the intensity of the events diminish exponentially with time. And the rate of this diminishing is a parameter we want to estimate.
out<-fitPP.fun(tind=TRUE,covariates=cbind(all.seconds), posE=aux$posNH, start=list(b0=0,b1=0),modSim=TRUE)
It is important to note, that we need to put
all.seconds as a covariate, not
lambdas. The exponentiation/logaritmization is done internally by the
fitPP.fun. BTW, apart from predicted values, the function makes two graphs by default.
The last piece is a swiss-knife function for model validation,
aux<-globalval.fun(obFPP=out,lint=2000, covariates=cbind(all.seconds),typeI='Disjoint', typeRes='Raw',typeResLV='Raw',resqqplot=FALSE)
Among other things, the function divides the time into intervals, each
lint samples long, so it is possible to create crude graphs that compare predicted intensity with observed intensity.