How can the F-test reject the null hypothesis while the KS test does not?

Suppose I have two datasets, $\mathbf{a}$ and $\mathbf{b}$. I want to test whether the two datasets are different in a statistically significant way.

To compute the F-test, I take the ratio of the variances of each dataset and compare this to F values based on some significance level (e.g. $\alpha = 0.05$) and the number of degrees of freedom. If the F value I computed lies outside the bounds of $1\pm\alpha$, then the null hypothesis is rejected (i.e. the two datasets are different in a statistically significant way).

To compute the KS test, I find the ECDF of each dataset and the find the maximum vertical distance between the ECDFs to compute the D-statistic. Similar, to the F-test, if the D-statistic is greater than some critical value, the null hypothesis is rejected (i.e. the two datasets are different in a statistically significant way).

My intuition is that the tests should generally give similar results. If something is statistically significant, it should be statistically significant for both tests, no? Perhaps this intuition is wrong. But, at the very least, I thought that the KS test was more sensitive than the F-test. As such, if the F-test rejects the null hypothesis, then I thought for sure, the KS test would also reject the null.

But I have found many cases where this is not true. I have some examples where the F-test results in rejection of the null hypothesis while the KS test does not!

Any explanation of why this could be is appreciated.


Significance testing consists of defining a rejection region, and rejecting if the data is in that region. The size of the region is its $\alpha$ value. If two different regions are different shapes, then even if one is smaller than the other, there can be places that are inside the smaller one but not in the larger one.

Dave’s answer explains that KS tests many different attributes, such as mean, variance, and multimodality. Suppose we restrict our attention to just mean and variance. We can then represent the sample on a two-dimensional plot, with one, say, differences in mean being the horizontal dimension and difference in variance being vertical:

Illustration of rejection regions

The $F$-test’s rejection region (blue) are two horizontal strips in this space: if difference in variance is too positive, or too negative, it rejects the null. The KS test’s rejection region (green) is (with some simplification) a ring: anything too far from the origin in any direction will be rejected. We can (again, with some simplification), consider each to have a “radius”, and anything outside that radius results in the null being rejected. But for the $F$-test, only the vertical distance from the $x$-axis is considered, while the distance from the origin is considered for the KS test.

If both have the same $\alpha$, then since the KS looks at both dimensions, its radius has to be larger. So if your sample has a small difference in mean, and a difference in variance that is slightly more than the $F$-test’s “radius”, then it will be within the KS radius.

Source : Link , Question Author : Darcy , Answer Author : Wrzlprmft

Leave a Comment