Quantifying similarity between two data sets

Summary: Trying to find the best method summarize the similarity between two aligned data sets of data using a single value.

Details:

My question is best explained with a diagram. The graphs below show two different data sets, each with values labeled nf and nr. The points along the x-axis represent where measurements were taken, and the values on the y-axis are the resulting measured value.

For each graph I want a single number to summarize the similarity of nf and nr values at each measurement point. In this example it’s visually obvious that the results in the first graphs are less similar than the ones in the second graph. But I have a lot of other data where the difference is less obvious, so being able to rank this quantitatively would be helpful.

I thought there might be a standard techniques that is typically used. Searching for statistical similarity has given a lot of different results, but I’m not sure what is best to pick or if things I’ve ready apply to my problem. So I thought this question might be worth asking here in case there is a simple answer.

enter image description here

Answer

Area between 2 curves may give you the difference. Hence sum(nr-nf) (sum of all differences) will be an approximation of the area between 2 curves. If you want to make it relative, sum(nr-nf)/sum(nf) can be used. These will give you a single value indicating similarity between 2 curves for each graph.

Edit: Above method of sum of differences will be useful even if these are separate points or observations and not connected lines or curves, but in that case, mean of differences can also be an indicator and may be better since it would take into account the number of observations.

Attribution
Source : Link , Question Author : Gabriel Southern , Answer Author : rnso

Leave a Comment