How to tell quantitatively whether 1D data is clustered around 1 or 3 values?

I’ve got some data on the time between heart beats of a human. One indication of ectopic (extra) beats is that these intervals are clustered around three values instead of one. How can I obtain a quantitative measure of this?

I’m looking to compare multiple data sets, and these two 100-bin histograms are representative of all of them.

enter image description here

I could compare the variances, but I want my algorithm to be able to detect whether there is one or three clusters in each case without comparing to the other cases.

This is for offline processing, so there’s a lot of computation power available, if that’s needed.

Answer

I advise strongly against using k-means here. The results for different values of k aren’t very well comparable. The method is just a crude heuristic. If you really want to use clustering, use EM clustering, since your data seems to contain normal distributions. And validate your results!

Instead, the obvious approach is to try fitting a single Gaussian function and (for example using the Levenberg-Marquard method) fit three Gaussian functions, maybe constrained to the same height (to avoid degeneration).

Then test, which of the two distributions fits better.

Attribution
Source : Link , Question Author : Nikolaus , Answer Author : Has QUIT–Anony-Mousse

Leave a Comment