I am conducting some research which involves visually/graphically observing the differences between the shapes of the distributions of different samples.
I would like to automate this process (at least somewhat), so that I can scale the number of samples I look at (as well as speeding things up, reducing human error etc.).
Is there a way to quantitatively describe/measure the shape of a distribution so that comparisons between shapes can be made algorithmically?
If the problem is uni-variate, then why not just do a KS test on the (centered, re scaled) vectors?
You can’t use the associated
pvalues (because the center and scale components
have been determined by the data) but the
D statistics gives a relative measure of the distance between the two vectors (In a nutshell, it’s simply the Chebychev distance between the two CDF).
R, it would be (assuming
y are two vectors of potentially different lengths (each vector contains one of the sample whose shape of the distribution you want to compare).
For example, if x∼P(λ) and y∼N(μ,σ2):
#two distributions with different shape y<-rnorm(100,0,3) x<-rpois(100,1) x_s<-(x-median(x))/mad(x) y_s<-(y-median(y))/mad(y) par(mfrow=c(2,1)) hist(y_s) hist(x_s) ks.test(x_s,y_s)
P.S. I left the original answer, because it seemed to be useful and frankly took me time to write. @Modo: let me know if it’s better to remove it.