In leave-one-out cross-validation (LOOCV), each of the training sets looks very similar to the others, differing in only one observation. When you want to estimate the test error, you take the average of the errors over the folds. That average has a high variance.
Is there a mathematical formula, visual, or intuitive way to understand why that average has a higher variance compared with the $k$-fold cross validation?
The original version of this answer was missing the point (that’s when the answer got a couple of downvotes). The answer was fixed in October 2015.
This is a somewhat controversial topic.
It is often claimed that LOOCV has higher variance than $k$-fold CV, and that it is so because the training sets in LOOCV have more overlap. This makes the estimates from different folds more dependent than in the $k$-fold CV, the reasoning goes, and hence increases the overall variance. See for example a quote from The Elements of Statistical Learning by Hastie et al. (Section 7.10.1):
What value should we choose for $K$? With $K = N$, the cross-validation
estimator is approximately unbiased for the true (expected) prediction error, but can have high variance because the $N$ “training sets” are so similar to one another.
See also a similar quote in the answer by @BrashEquilibrium (+1). The accepted and the most upvoted answers in Variance and bias in cross-validation: why does leave-one-out CV have higher variance? give the same reasoning.
HOWEVER, note that Hastie et al. do not give any citations, and while this reasoning does sound plausible, I would like to see some direct evidence that this is indeed the case. One reference that is sometimes cited is Kohavi 1995 but I don’t find it very convincing in this particular claim.
MOREOVER, here are two simulations that show that LOOCV either has the same or even a bit lower variance than 10-fold CV:
- Does $K$-fold CV with $K=N$ (LOO) provide the MOST or LEAST variable estimates, and what is the role of “stability”?.
- See also the paper linked in https://stats.stackexchange.com/a/252031. It says that it is a “misconception” that LOOCV has high variance.