Can someone give me a hint about a good approach to find a frequent patterns in a single sequence.

For example there is the single sequence

`3 6 1 2 7 3 8 9 7 2 2 0 2 7 2 8 4 8 9 7 2 4 1 0 3 2 7 2 0 3 8 9 7 2 0`

I am looking for a method that can detect frequent patterns in this ordered sequence:

`3 6 1 [2 7] 3 [8 9 7 2] 2 0 [2 7] 2 8 4 [8 9 7 2] 4 1 0 3 [2 7] 2 0 3 [8 9 7 2] 0`

Also other information would be interesting like:

- What is the probability that 7 comes after 2
- When each number has a timestamp assigned to it, what is the estimated time interval that 7 occurs after 2
The sequential pattern mining methods I found require multiple sequences, but I have one large sequence where I want to detect regularities.

Thanks for any help!

**Answer**

Calculate a histogram of N-grams and threshold at an appropriate level. In Python:

```
from scipy.stats import itemfreq
s = '36127389722027284897241032720389720'
N = 2 # bi-grams
grams = [s[i:i+N] for i in xrange(len(s)-N)]
print itemfreq(grams)
```

The N-gram calculation (lines three and four) are from this answer.

The example output is

```
[['02' '1']
['03' '2']
['10' '1']
['12' '1']
['20' '2']
['22' '1']
['24' '1']
['27' '3']
['28' '1']
['32' '1']
['36' '1']
['38' '2']
['41' '1']
['48' '1']
['61' '1']
['72' '5']
['73' '1']
['84' '1']
['89' '3']
['97' '3']]
```

So 72 is the most frequent two-digit subsequence in your example, occurring a total of five times. You can run the code for all N you are interested about.

**Attribution***Source : Link , Question Author : MikeHuber , Answer Author : Community*