# Sequential pattern mining on single sequence

Can someone give me a hint about a good approach to find a frequent patterns in a single sequence.

For example there is the single sequence

3 6 1 2 7 3 8 9 7 2 2 0 2 7 2 8 4 8 9 7 2 4 1 0 3 2 7 2 0 3 8 9 7 2 0


I am looking for a method that can detect frequent patterns in this ordered sequence:

3 6 1 [2 7] 3 [8 9 7 2] 2 0 [2 7] 2 8 4 [8 9 7 2] 4 1 0 3 [2 7] 2 0 3 [8 9 7 2] 0


Also other information would be interesting like:

• What is the probability that 7 comes after 2
• When each number has a timestamp assigned to it, what is the estimated time interval that 7 occurs after 2

The sequential pattern mining methods I found require multiple sequences, but I have one large sequence where I want to detect regularities.

Thanks for any help!

Calculate a histogram of N-grams and threshold at an appropriate level. In Python:

from scipy.stats import itemfreq
s = '36127389722027284897241032720389720'
N = 2 # bi-grams
grams = [s[i:i+N] for i in xrange(len(s)-N)]
print itemfreq(grams)


The N-gram calculation (lines three and four) are from this answer.

The example output is

[['02' '1']
['03' '2']
['10' '1']
['12' '1']
['20' '2']
['22' '1']
['24' '1']
['27' '3']
['28' '1']
['32' '1']
['36' '1']
['38' '2']
['41' '1']
['48' '1']
['61' '1']
['72' '5']
['73' '1']
['84' '1']
['89' '3']
['97' '3']]


So 72 is the most frequent two-digit subsequence in your example, occurring a total of five times. You can run the code for all $N$ you are interested about.