# Estimating Markov transition probabilities from sequence data

I have a full set of sequences (432 observations to be precise) of 4 states $A-D$: eg

EDIT: The observation sequences are of unequal lengths! Does this change anything?

Is there a way of calculating the transition matrix in Matlab or R or similar? I think the HMM package might help. Any thoughts?

Please, check the comments above. Here is a quick implementation in R.

x <- c(1,2,1,1,3,4,4,1,2,4,1,4,3,4,4,4,3,1,3,2,3,3,3,4,2,2,3)
p <- matrix(nrow = 4, ncol = 4, 0)
for (t in 1:(length(x) - 1)) p[x[t], x[t + 1]] <- p[x[t], x[t + 1]] + 1
for (i in 1:4) p[i, ] <- p[i, ] / sum(p[i, ])


Results:

> p
[,1]      [,2]      [,3]      [,4]
[1,] 0.1666667 0.3333333 0.3333333 0.1666667
[2,] 0.2000000 0.2000000 0.4000000 0.2000000
[3,] 0.1428571 0.1428571 0.2857143 0.4285714
[4,] 0.2500000 0.1250000 0.2500000 0.3750000


A (probably dumb) implementation in MATLAB (which I have never used, so I don’t know if this is going to work. I’ve just googled “declare vector matrix MATLAB” to get the syntax):

x = [ 1, 2, 1, 1, 3, 4, 4, 1, 2, 4, 1, 4, 3, 4, 4, 4, 3, 1, 3, 2, 3, 3, 3, 4, 2, 2, 3 ]
n = length(x) - 1
p = zeros(4,4)
for t = 1:n
p(x(t), x(t + 1)) = p(x(t), x(t + 1)) + 1
end
for i = 1:4
p(i, :) = p(i, :) / sum(p(i, :))
end