I suspect that a series of observed sequences are a Markov chain…

X=(ACDDBACBAACADA⋮⋮⋮⋮⋮⋮⋮BCADABE)

However how could I check that they indeed respect the memoryless property of P(Xi=xi|Xj=xj)?

Or at the very least prove that they are Markov in nature? Note these are empirically observed sequences. Any thoughts?

EDITJust to add, the aim is to compare a predicted set of sequence from the observed ones. So we’d appreciate comments on as to how best to compare these.

First Order Transition matrix Mij=xij∑mxikwhere m=A..E statesM=(0.18340.30770.07690.14790.28400.46970.11360.00760.25000.15910.18270.24040.22120.19230.16350.23780.18180.06290.33570.18180.24580.17880.11730.17880.2793)

Eigenvalues of M

E=(1.000000000−0.2283000000.1344000000.1136−0.0430i000000.1136+0.0430i)

Eigenvectors of M

V=(0.4472−0.5852−0.4219−0.2343−0.0421i−0.2343+0.0421i0.44720.7838−0.4211−0.4479−0.2723i−0.4479+0.2723i0.4472−0.20060.37250.63230.63230.4472−0.00100.70890.2123−0.0908i0.2123+0.0908i0.44720.05400.05890.2546+0.3881i0.2546−0.3881i)

**Answer**

I wonder if the following would give a valid Pearson χ2 test for proportions as follows.

- Estimate the one-step transition probabilities — you’ve done that.
- Obtain the two-step model probabilities:

ˆpU,V=Prob[Xi+2=U|Xi=V]=∑W∈{A,B,C,D}Prob[Xi+2=U|Xi+1=W]Prob[Xi+1=W|Xi=V] - Obtain the two-step empirical probabilities ˜pU,V=∑i#Xi=V,Xi+2=U∑i#Xi=V
- Form Pearson test statistic TV=#{Xi=V}∑U(ˆpU,V−˜pU,V)2ˆpU,V,T=TA+TB+TC+TD

It is *tempting* for me to think that each TU∼χ23, so that the total T∼χ212. However, I am not entirely sure of that, and would appreciate your thoughts on this. I am not likewise not co sertain about whether one needs to be paranoid about independence, and would want to split the sample in halves to estimate ˆp and ˉp.

**Attribution***Source : Link , Question Author : HCAI , Answer Author : StasK*