# Independence of residuals in a computer-based experiment/simulation?

I conducted a computer-based assessment of different methods of fitting a particular type of model used in the palaeo sciences. I had a large-ish training set and so I randomly (stratified random sampling) set aside a test set. I fitted $m$ different methods to the training set samples and using the $m$ resulting models I predicted the response for the test set samples and computed a RMSEP over the samples in the test set. This is a single run.

I then repeated this process a large number of times, each time I chose a different training set by randomly sampling a new test set.

Having done this I want to investigate if any of the $m$ methods has better or worse RMSEP performance. I also would like to do multiple comparisons of the pair-wise methods.

My approach has been to fit a linear mixed effects (LME) model, with a single random effect for Run. I used lmer() from the lme4 package to fit my model and functions from the multcomp package to perform the multiple comparisons. My model was essentially

lmer(RMSEP ~ method + (1 | Run), data = FOO)


where method is a factor indicating which method was used to generate the model predictions for the test set and Run is an indicator for each particular Run of my “experiment”.

My question is in regard to the residuals of the LME. Given the single random effect for Run I am assuming that the RMSEP values for that run are correlated to some degree but are uncorrelated between runs, on the basis of the induced correlation the random effect affords.

Is this assumption of independence between runs valid? If not is there a way to account for this in the LME model or should I be looking to employ another type of statical analysis to answer my question?