How to predict when the next event occurs, based on times of previous events?

I’m a high school student and I’m working on a computer programming project, but I don’t have a lot of experience in statistics and modeling data beyond a high school statistics course so I’m kinda confused.

Basically, I have a reasonably large list (assume it’s large enough to meet the assumptions for any statistical tests or measures) of times that someone decided to print a document. Based on this list, I would like to construct a statistical model of some sort that will predict the most likely time for the next print job given all of the previous event times.

I’ve already read this, but the responses don’t exactly help out with what I have in mind for my project. I did some additional research and found that a Hidden Markov Model would likely allow me to do so accurately, but I can’t find a link on how to generate a Hidden Markov Model using just a list of times. I also found that using a Kalman filter on the list may be useful but basically, I’d like to get some more information about it from someone who’s actually used them and knows their limitations and requirements before just trying something and hoping it works.

Thanks a bunch!

Answer

Hidden Markov models would apply if the data were random emissions from some underlying unobserved Markov model; I wouldn’t rule that out, but it doesn’t seem a very natural model.

I would think about point processes, which match your particular data well. There is a great deal of work on predicting earthquakes (though I don’t know much about it) and even crime.

If there are many different people printing, and you’re just seeing the times but not the individual identities, a Poisson process might work well (the superposition of multiple independent point processes is approximately Poisson), though it would have to be inhomogeneous (the chance of a point varies over time): people are less likely to be printing at 3am than at 3pm.

For the inhomogeneous Poisson process model, the key would be getting a good estimate of the chance of a print job at a particular time on a particular day.

If these print times are for students in a classroom, though, it could be quite tricky, as they’re not likely to be independent and so the Poisson process wouldn’t work well.

Here’s a link to a paper on the crime application.

Attribution
Source : Link , Question Author : ankushg , Answer Author : Karl

Leave a Comment