What are the best methods for fitting the ‘mode’ of data sampled from a continuous distribution?
Since the mode is technically undefined (right?) for a continuous distribution, I’m really asking ‘how do you find the most common value’?
If you assume the parent distribution is gaussian, you could bin the data and find say the mode is the bin location with the greatest counts. However, how do you determine the bin size? Are there robust implementations available? (i.e., robust to outliers). I use
numpy, but I can probably translate
Rwithout too much difficulty.
In R, applying the method that isn’t based on parametric modelling of the underlying distribution and uses the default kernel estimator of density to 10000 gamma distributed variables:
x <- rgamma(10000, 2, 5) z <- density(x) plot(z) # always good to check visually z$x[z$y==max(z$y)]
returns 0.199 which is the value of x estimated to have the highest density (the density estimates are stored as “z$y”).