# Stability of cross-validation in Bayesian models

I’m fitting a Bayesian HLM in JAGS using k-fold cross-validation (k=5). I’d like to know whether estimates of parameter $\beta$ are stable across all folds. What’s the best way to do this?

One idea is to find the differences of the posteriors of $\beta$ and to
see if 0 is in the 95% CI of the difference. In other words, is 0 in the 95% interval of $\beta_{k=1}-\beta_{k=2}$ (and then repeat for all pairs of folds).

Another idea is to treat the posteriors from each fold as different MCMC chains, and to
compute Gelman’s $\hat{R}$ (Potential Scale Reduction Factor) across these pseudo-chains.

Is one of these preferable, and are there alternatives?

## Answer

I don’t know if this qualifies as a comment or as an answer. I’m putting here because it feels like an answer.

In k-fold cross-validation you are partitioning your data into k groups. If you are covering even the “basics” then you are uniformly randomly selecting members for each of the k bins.

When I speak of data, I think of each row as a sample, and each column as a dimension. I’m used to using various methods to determine variable importance, column importance.

What if you, as a thought exercise, departed from the “textbook” uniform random, and determined which rows were important? Maybe they inform a single variable at a time, but maybe they inform more. Are there some rows that are less important than others? Maybe many of the points are informative, maybe few are.

Knowing the importance of the variable, perhaps you could bin them by importance. Maybe you could make a single bin with the most important samples. This could define the size of your “k”. In this way, you would be determining the “most informative” kth bucket and comparing it against others, and against the least informative bucket.

This could give you an idea of the maximal variation of your model parameters. It is only one form.

A second way of splitting the kth buckets is by the magnitude and the direction of the influence. So you could put samples that sway a parameter or parameters in one direction into one bucket and put samples that sway the same parameter or parameters in the opposite direction into a different bucket.

The parameter variation in this form might give a wider sweep to the variables, based not on information density, but on information breed.

Best of luck.

Attribution
Source : Link , Question Author : Jack Tanner , Answer Author : EngrStudent