Statistical similarity of time series

Supposing one has a time series from which one can take various measurements such as period, maximum, minimum, average etc. and then use these to create a model sine wave with the same attributes, are there any statistical approaches one can use that could quantify how closely the actual data fit the assumed model? The number of data points in the series would range between 10 and 50 points.

A very simplistic first thought of mine was to ascribe a value to the directional movement of the sine wave, i.e. +1 +1 +1 +1 -1 -1 -1 -1 -1 -1 -1 -1 +1 +1 +1 +1, do the same to the actual data, and then somehow quantify the degree of similarity of directional movement.

Edit: Having given more thought to what I really want to do with my data, and in light of responses to my original question, what I need is a decision making algorithm to choose between competing assumptions: namely that my data is basically linear (or trending) with noise that could possibly have cyclic elements; my data is basically cyclic with no directional trend to speak of; the data is essentially just noise; or it is transitioning between any of these states.

My thoughts now are to maybe combine some form of Bayesian analysis and Euclidean/LMS metric. The steps in this approach would be

Create the assumed sine wave from data measurements

Fit a LMS straight line to the data

Derive an Euclidean or LMS metric for departures from the original data for each of the above

Create a Bayesian prior for each based on this metric i.e. 60 % of the combined departures attach to one, 40 % to the other, hence favour the 40 %

slide a window one data point along the data and repeat the above to obtain new % metrics for this slightly changed data set – this is the new evidence – do the Bayesian analysis to create a posterior and change the probabilities that favour each assumption

repeat along the whole data set (3000+ data points) with this sliding window (window length 10-50 data points). The hope/intent is to identify the predominant/favoured assumption at any point in the data set and how this changes with time

Any comments on this potential methodology would be welcome, particularly on how I could actually implement the Bayesian analysis part.


The Euclidean distance is a common metric in machine learning. The following slides provide a good overview of this area along with references:

Also see the references on Keogh’s benchmarks page for time series classification:

Source : Link , Question Author : babelproofreader , Answer Author : ars

Leave a Comment