Lets say I have two or more sample populations of n-dimensional continuous-valued vectors. Is there a nonparametric way to test if these samples are from the same distribution? If so, is there a function in R or python for this?
I just did a lot of research on multivariate two sample tests when I realized that the Kolmogorov-Smirnov test wasn’t multivariate. So I looked at the Chi test, Hotelling’s T^2, Anderson-Darling, Cramer-von Mises criterion, Shapiro-Wilk, etc. You have to be careful because some of these tests rely on the vectors being compared to be of the same length. Others are only used to reject the assumption of normality, not to compare two sample distributions.
The leading solution seems to compare the two samples’ cumulative distribution functions with all possible orderings which, as you may suspect, is very computationally intensive, on the order of minutes for a single run of a sample containing a few thousand records:
As Xiao’s documentation states, the Fasano and Franceschini test is a variant of the Peacock test:
The Fasano and Franceschini test was specifically intended to be less computationally intensive, but I have not found an implementation of their work in R.
For those of you who want to explore the computational aspects of the Peacock versus Fasano and Franceschini test, check out Computationally efficient algorithms for the two-dimensional Kolmogorov–Smirnov test