# Hidden Markov model for event prediction

Question: Is the set-up below a sensible implementation of a Hidden Markov model?

I have a data set of 108,000 observations (taken over the course of 100 days) and approximately 2000 events throughout the whole observation time-span. The data looks like the figure below where the observed variable can take 3 discrete values $[1,2,3]$ and the red columns highlight event times, i.e. $t_E$’s: As shown with red rectangles in the figure, I have dissected {$t_E$ to $t_{E-5}$} for each event, effectively treating these as “pre-event windows”.

HMM Training: I plan to train a Hidden Markov Model (HMM) based on all “pre-event windows”, using the multiple observation sequences methodology as suggested on Pg. 273 of Rabiner’s paper. Hopefully, this will allow me to train an HMM that captures the sequence patterns which lead to an event.

HMM Prediction: Then I plan to use this HMM to predict $log[P(Observations|HMM)]$ on a new day, where $Observations$ will be a sliding window vector, updated in real-time to contain the observations between the current time $t$ and $t-5$ as the day goes on.

I expect to see $log[P(Observations|HMM)]$ increase for $Observations$ that resemble the “pre-event windows”. This should in effect allow me to predict the events before they happen.

One problem with the approach you’ve described is you will need to define what kind of increase in $P(O)$ is meaningful, which may be difficult as $P(O)$ will always be very small in general. It may be better to train two HMMs, say HMM1 for observation sequences where the event of interest occurs and HMM2 for observation sequences where the event doesn’t occur. Then given an observation sequence $O$ you have
\begin{align*} P(HHM1|O) &= \frac{P(O|HMM1)P(HMM1)}{P(O)} \\ &\varpropto P(O|HMM1)P(HMM1) \end{align*}
\begin{align*} P(HMM1|O) &> P(HMM2|O) \\ \implies \frac{P(HMM1)P(O|HMM1)}{P(O)} &> \frac{P(HMM2)P(O|HMM2)}{P(O)} \\ \implies P(HMM1)P(O|HMM1) &> P(HMM2)P(O|HMM2). \end{align*}