# Quantifying similarity between two data sets

Summary: Trying to find the best method summarize the similarity between two aligned data sets of data using a single value.

Details:

My question is best explained with a diagram. The graphs below show two different data sets, each with values labeled `nf` and `nr`. The points along the x-axis represent where measurements were taken, and the values on the y-axis are the resulting measured value.

For each graph I want a single number to summarize the similarity of `nf` and `nr` values at each measurement point. In this example it’s visually obvious that the results in the first graphs are less similar than the ones in the second graph. But I have a lot of other data where the difference is less obvious, so being able to rank this quantitatively would be helpful.

I thought there might be a standard techniques that is typically used. Searching for statistical similarity has given a lot of different results, but I’m not sure what is best to pick or if things I’ve ready apply to my problem. So I thought this question might be worth asking here in case there is a simple answer.

Area between 2 curves may give you the difference. Hence `sum(nr-nf)` (sum of all differences) will be an approximation of the area between 2 curves. If you want to make it relative, `sum(nr-nf)/sum(nf)` can be used. These will give you a single value indicating similarity between 2 curves for each graph.