Should I include random effects in a model even if they aren’t statistically significant? I have a repeated measures experimental design, in which each individual experiences three different treatments in random order. I would like to control for any effects of individual and order, but neither seem to be statistically significant in my models. Does that make it OK to exclude them, or should I still include them?
My recommendation is to include the random effects in the model even if they are not statistically significant, on the grounds that the statistical analysis then more faithfully represents the actual study design.
This allows you to write something like this in your Statistical Methods section:
Random effects were included for individual and order to control for
possible dependence due to repeated measures or order effects.
This will probably forestall reviewer comments about dependence assumptions or pseudo-replication. It is just easier to do this than to “explain” why it is okay to drop those terms, even if they seem essentially useless.
Also, having those terms in the model is probably not costing you anything. I would be surprised and suspicious if the results changed dramatically when you removed them.
Here are some considerations:
Sometimes, the distribution of the data does not allow fitting of the model to the data. This can happen when due to cost, time or effort very few trials are collected on purpose, when the data are too sparse in some way, or when the distribution of the data turn out to be degenerate or too flat.
In this case, you may have no way to proceed other than simplifying the model, maybe dramatically. Usually, I try to first drop the effects that are at the finest granularity, since there are usually more of them to be estimated.
In the worst case, you might wish to proceed as though the data were collected independently. This may be better than nothing, but significance tests are going to have to be taken with a big grain of salt. The interpretation of the results should be hedged quite a bit.
In some situations, it might be reasonable to pool terms in order to get some information in order to proceed. Here, I am thinking more about experimental design in ongoing research and development, rather than for publication.
Lorenzen and Anderson (1993) give “sometimes pooling” rules for the case where it would be helpful to get more precise tests of other factors in the model.
- A term in the model will be declared negligible and a candidate for removal from the model and EMS column if it is insignificant at the $\alpha=0.25$ level.
- A term should not be removed from the model if a higher order interaction involving that term is significant at the $0.25$ level.
Again, though, this type of rule is more for practical use and not for publication use, in my opinion.
Now, it might be that in fact you get essentially “identical” results when you drop those random effects. That is nice but you should be aware that you are now fitting two different models, and the terms might need to be interpreted differently even though they might be “the same”.
What I would take from that is that the results are robust under various assumptions. That’s always a good thing.
Dropping terms can also be thought of as part of “model selection”, “model building”, or “model simplification”. There is a variety of methodologies out there for model selection. While “drop the terms with insignificant $p$-values” is one such method, it does not seem to have much support theoretically in general. I am not sure how the various methodologies fare with mixed models.
Also, depending on how you want to interpret the results from your model, you may not wish to “simplify” it. Littell et al (2006) has a little discussion (p. 211) about narrow versus wide inference and population-wide versus subject-specific inference in a simple setting. In your case you are probably interested in broad inference, making conclusions that pertain to the entire population rather than to just the individuals in your study.
Anyway, in your case, your study was run in a way that introduced potential for dependence based on order and individuals. If you can accurately model the structure of your study then you should.
Littell, Milliken, Stroup, Wolfinger, and Schabenberger (2006) SAS for Mixed Models. SAS.
Lorenzen and Anderson (1993) Design of Experiments: A No-Name Approach. Marcel Dekker, Inc.