Question:Is the set-up below a sensible implementation of a Hidden Markov model?I have a data set of

`108,000`

observations (taken over the course of 100 days) and approximately`2000`

events throughout the whole observation time-span. The data looks like the figure below where the observed variable can take 3 discrete values $[1,2,3]$ and the red columns highlight event times, i.e. $t_E$’s:As shown with red rectangles in the figure, I have dissected {$t_E$ to $t_{E-5}$} for each event, effectively treating these as “pre-event windows”.

HMM Training:I plan to train a Hidden Markov Model (HMM) based on all “pre-event windows”, using the multiple observation sequences methodology as suggested on Pg. 273 of Rabiner’s paper. Hopefully, this will allow me to train an HMM that captures the sequence patterns which lead to an event.

HMM Prediction:Then I plan to use this HMM to predict $log[P(Observations|HMM)]$ on a new day, where $Observations$ will be a sliding window vector, updated in real-time to contain the observations between the current time $t$ and $t-5$ as the day goes on.I expect to see $log[P(Observations|HMM)]$ increase for $Observations$ that resemble the “pre-event windows”. This should in effect allow me to predict the events before they happen.

**Answer**

One problem with the approach you’ve described is you will need to define what kind of increase in $P(O)$ is meaningful, which may be difficult as $P(O)$ will always be very small in general. It may be better to train two HMMs, say HMM1 for observation sequences where the event of interest occurs and HMM2 for observation sequences where the event **doesn’t** occur. Then given an observation sequence $O$ you have

$$

\begin{align*}

P(HHM1|O) &= \frac{P(O|HMM1)P(HMM1)}{P(O)} \\

&\varpropto P(O|HMM1)P(HMM1)

\end{align*}

$$

and likewise for HMM2. Then you can predict the event will occur if

$$

\begin{align*}

P(HMM1|O) &> P(HMM2|O) \\

\implies \frac{P(HMM1)P(O|HMM1)}{P(O)} &> \frac{P(HMM2)P(O|HMM2)}{P(O)} \\

\implies P(HMM1)P(O|HMM1) &> P(HMM2)P(O|HMM2).

\end{align*}

$$

* Disclaimer: What follows is based on my own personal experience, so take it for what it is.* One of the nice things about HMMs is they allow you to deal with variable length sequences and variable order effects (thanks to the hidden states). Sometimes this is necessary (like in lots of NLP applications). However, it seems like you have a priori assumed that only the last 5 observations are relevant for predicting the event of interest. If this assumption is realistic then you may have significantly more luck using traditional techniques (logistic regression, naive bayes, SVM, etc) and simply using the last 5 observations as features/independent variables. Typically these types of models will be easier to train and (in my experience) produce better results.

**Attribution***Source : Link , Question Author : Zhubarb , Answer Author : alto*