Help me build a linear mixed model in R using this structure

My study looked at measuring virus titres in leaves over time. I took repeated samples from 10 subject plants. Sampling was repeated on 4 dates. Leaf sampling was done at 3 spaced loci up shoots (basal, medial, distal) on each date.

10 subjects x 4 dates x 3 loci = 120 samples.

Here’s how the data looks, using the mean of subjects for each combination of locus x date. Error bars are standard error.
Visualized data

Experimental questions:

  1. Do virus titres decrease in leaves as we sample further up the shoots? (Spatial aspect)
  2. Is there a trend over time? (Temporal aspect)

Mixed model variables:

Response = Titre

Fixed effects = Locus, Date

Grouping factor = Subject

Initial concerns: Each subject is measured on each date. Does this make my design fully crossed? How do I address this in the lmer (from the lme4 package) formula?

I believe my best model will be random-slope and random-intercept. As such, this is the formula I’ve conjured:

test.lmer <- lmer(titre ~ date + (1+date|locus), data = raw.leaf)

which works, but does produce the warning boundary (singular) fit: see ?isSingular.

What am I doing right, what am I doing wrong? And what key information have I left out, if any?

Addendum for clarification

By repeated sampling, I mean new leaves were sampled from the same subject plants over time. To minimize leaf to leaf variation, three leaves per subject x locus combination were sampled and pooled for composite extraction.

Answer

Answer No. 1:

Since nobody else has contributed to this thread so far, I wanted to chime in with some further thoughts.

Based on your clarifications, it looks like you have a 2-level hierarchical data structure. The top level of this structure (level 1) consists of the 10 plants. The bottom level of this structure (level 0) consists of 12 “samples” per plant (where a “sample” is a collection of 3 leafs, taken on 4 consecutive dates from 3 different plant loci per date).
Given a plant, it seems that the 4 * 3 = 12 samples are distinct from each other.

Level 0 of your data hierarchy is the level where the response variable is measured – a viral load of some kind. At this same level, the values of two predictor variables are also recorded: Date and Locus.

With this in mind, one potential model you can consider would look like this:

model0 <- lmer(viral_load ~ date*locus + (1|plant), data = yourdata) 

This model assumes that viral_load is a continuous variable (rather than a count variable). The plant is the random grouping factor (which you called subject).

The use of the terminology “trend” seems questionable to me for this design. Given a plant and a locus, we would need to measure viral load on the same “sample” over time to speak of a “trend” – but in this case, we are measuring viral load on different “samples” over time coming from the same locus within the same plant. So the language used to frame the questions of interest should be softened to reflect this – rather than “trend over time”, we can speak of comparisons between time points (i.e., dates). In particular, it may be interesting to compare average viral load across all “samples” represented by the ones included in the study between Dates 2 and 1, Dates 3 and 1 and Dates 4 and 1 for a particular locus of a “typical” plant.

The model above could be expanded if necessary to include random slopes for date and/or locus and/or their interaction; for example:

model1 <- lmer(viral_load ~ date*locus + (1 + date*locus|plant), 
               data = yourdata) 

Answer No.2:

In principle, given your study design, you can consider models with random effects specified as:

model1 <- lmer(viral_load ~ date * locus + (1 + date*locus|plant), 
               data = yourdata) 

model2 <- lmer(viral_load ~ date * locus + (1 + date + locus|plant), 
               data = yourdata) 

model3 <- lmer(viral_load ~ date * locus + (1 + date|plant), 
               data = yourdata) 

model4 <- lmer(viral_load ~ date * locus + (1 + locus|plant), 
               data = yourdata) 


model5 <- lmer(viral_load ~ date * locus + (1|plant), 
               data = yourdata) 

This nice blog post, Random effects: Should I stay or should I go? (https://hlplab.wordpress.com/2009/05/14/random-effect-structure/#comment-9117) suggests that you can start from model1 (which has the most complex random effects structure) and then proceed to simplify the random effects structure as much as feasible (if at all).

For the above sequence of models, note that model3 and model4 are nested within model2, but model4 is NOT nested within model3 or vice versa. Furthermore, model5 is nested within both model3 and model4. So you need to be careful with setting up the likelihood ratio test so that it refers to two nested models. If the p-value of the test is “small”, then the data privide evidence that the larger model is appropriate for the data (that is, the model which includes an additional random slope).

For example, do you really need to include random slopes for date, locus and their interaction as in model1 or can you get away with including random slopes for just date and locus as in model2? To this end, you can perform a likelihood ratio test comparing the two nested models (provided both models are fitted using REML) by means of the anova() function. In this situation, model2 is nested inside model1 since the latter includes an additional random slope compared to the former, so you can use anova(model2, model1).

See also the section on Model Comparison from the notes on Fitting, Evaluating, and Reporting Mixed Models of Florian Jaeger:

https://wiki.bcs.rochester.edu/HlpLab/StatsCourses?action=AttachFile&do=get&target=Groningen11.pdf.

The performance package also has some nice functionality for comparing all of the above models in one go (rather than comparing sets of two nested models):

library(performance)

compare_performance(model1, model2, model3, model4, model5)

As explained at https://easystats.github.io/performance/, this package also computes a general index of model performance:

compare_performance(model1, model2, model3, model4, model5, 
                    rank = TRUE)

and produces a visualization of the model performance with respect to various indexes of performance:

library(see) 

plot(compare_performance(model1, model2, model3, model4, model5, 
                    rank = TRUE))

If you go via this second route, it’s good to keep in mind what Florian Jaeger stated in his notes:

“Models can be compared for performance using any goodness-of-fit measures. Generally, an advantage in one measure comes with advantages in others, as well.”

He also emphasizes in his notes that the likelihood ratio test is for nested models only.

Addendum:

Dieter, I wanted to address your comment:

Second, I still don’t know how to properly formulate nested or crossed models with the random slopes term.

Imagine two different versions of a study where subjects are asked to rate images on two different occasions: Version 1 and Version 2. Let’s say the two occasions are before and after seeing the image.

Version 1: Each of the subjects rates multiple images before and after seeing them and the images are kept the same across all subjects.

Version 2 Each of the subjects rates multiple images before and after seeing them but the images are unique to each subject; no two subjects see the same image.

I borrowed the idea for this study from https://m-clark.github.io/mixed-models-with-R/extensions.html.

Crossed Model

Version 1 of the study will give rise to a fully crossed linear mixed effects model, since we can treat subject and
image as fully crossed random grouping factors. This model will look like this:

lmer(rating ~ occasion + (1|subject) + (1|image)) 

Now, for each subject, let’s say that you measure their stress level before and after they rate an image.

At a particular occasion, the effect of the stress covariate on the rating of a “typical” image can be allowed to vary across subjects via this syntax:

lmer(rating ~ occasion + stress + (1 + stress|subject) + (1|image)) 

If your model were to only look like this:

lmer(rating ~ occasion + stress + (1|subject) + (1|image)) 

that would mean that, at a particular occasion, the effect of the stress covariate on the rating of a “typical” image is assumed the same across subjects.

In this setting, think of subjects and images forming two side by side piles – then you get to combine each subject from the subject pile with each image from the image pile:

subject pile       image pile 

This means that:

subject 1   gets the entire image pile

subject 2   gets the entire image pile 

etc.

There is no hierarchy in your data as far as the random grouping factors are concerned. You can measure subject-level covariates on each occasion and/or image-level covariates on each occasion. The effects of subject-level covariates can be allowed to vary across subjects, while those of image-level covariates can be allowed to vary across images.

Nested Model

Version 2 of the study will give rise to a nested linear mixed effects model, since we can treat image as a random grouping factor nested within the random grouping factor subject. An initial model will look like this:

lmer(rating ~ occasion + (1|subject) + (1|subject:image)) 

Again, for each subject, let’s say that you measure their stress level before and after they rate an image in Version 2 of your study. If you add stress as a covariate in your model, you can allow its effect on rating to vary across images (nested in subjects) like so:

lmer(rating ~ occasion + stress + (1|subject) + (1 + stress|subject:image)) 

The key with nested models is to first explicitly write out the random grouping factors, as seen here: (1|subject) and (1|subject:image). Then, once you figure out at what level of your nested data hierarchy the covariate is measured, you can tuck it in the appropriate place in your model if it can have varying effects:

   subject                            <- level 2 (random grouping factor)

   image (nested in subject)          <- level 1 (random grouping factor)

   rating (measures on 2 occasions)   <- level 0 (response level) 

In a nested setup, if you measure a covariate at the lowest level of your data hierarchy (e.g., stress) which is level 0, its effect can vary across the levels of the random grouping factor sitting right above it in the data hierarchy.

If you measure a covariate at the next lowest level of your data hierarchy – level 1, its effect can vary across the levels of the random grouping factor sitting right above that level of the data hierarchy. An example of such covariate would be an image characteristic, which is invariant to occasion (e.g., image size).

Covariates measured at the highest level of your data hierarchy (e.g., subject gender) cannot have varying effects in your model.

Attribution
Source : Link , Question Author : Dieter Kahl , Answer Author : Isabella Ghement

Leave a Comment