I’m trying to use k-fold cross validation for model selection for a mixed-effect model (fitted with the
But, what exactly do I use as the score for each fold? Presumably I don’t just fit each candidate model to the validation subset, calculating new coefficients based on the new data. If I understand correctly, I’m supposed to score the models according to how well a model with coefficients calculated using the training data fits the validation data.
But how does one calculate AIC, BIC, logLik, adjR^2, etc on an artificial model that gets its coefficients from one source and its data from another? With so many people advocating cross-validation, I thought there would be more information and code available for calculating the scores by which models will be compared. I can’t be the first one trying to cross-validate
lmefits in R, yet I see absolutely nothing about what to use as the score… how does everyone else do this? What am I overlooking?
I’ve mostly seen cross-validation used in a machine-learning context where one thinks in terms of a loss function that one is trying to minimize. The natural loss function associated with linear models is mean squared error (which is basically the same as $R^2$). Calculating this for test data is very simple.
You could also use other loss functions (mean absolute error, rank correlation, etc.). However, since the linear model learns by minimizing $R^2$, it might be advisable to try a different model in this case that maximizes whatever loss function you chose (e.g. quantile regression for the mean absolute error).