In a mixed effects model the recommendation is to use a fixed effect to estimate a parameter if all possible levels are included (e.g., both males and females). It is further recommended to use a random effect to account for a variable if the levels included are just a random sample from a population (enrolled patients from the universe of possible patients) and you want to estimate the population mean and variance instead of the means of the individual factor levels.
I am wondering if you are logically obliged to always use a fixed effect in this manner. Consider a study on how foot / shoe size changes through development and is related to, say, height, weight and age. Side clearly must be included in the model somehow to account for the fact that the measurements over the years are nested within a given foot and are not independent. Moreover, right and left are all the possibilities that can exist. In addition, it can be very true that for a given participant their right foot is larger (or smaller) than their left. However, although foot size does differ somewhat between the feet for all people, there is no reason to believe that right feet will on average be larger than left feet. If they are in your sample, this is presumably due to something about the genetics of the people in your sample, rather than something intrinsic to right-foot-ness. Finally, side seems like a nuisance parameter, not something you really care about.
Let me note that I made this example up. It may not be any good; it is just to get the idea across. For all I know, having a large right foot and a small left foot was necessary for survival in the paleolithic.
In a case like this, would it make (more / less / any) sense to incorporate side in the model as a random effect? What would be the pros and cons of using a fixed vs. random effect here?
The general problem with “fixed” and “random” effects is that they are not defined in a consistent way. Andrew Gelman quotes several of them:
(1) Fixed effects are constant across individuals, and random effects
vary. For example, in a growth study, a model with random intercepts
ai and fixed slope b corresponds to parallel lines for different
individuals i, or the model yit=ai+bt. Kreft and De Leeuw
(1998) thus distinguish between fixed and random coefficients.
(2) Effects are fixed if they are interesting in themselves or random
if there is interest in the underlying population. Searle, Casella,
and McCulloch (1992, Section 1.4) explore this distinction in depth.
(3) “When a sample exhausts the population, the corresponding variable
is fixed; when the sample is a small (i.e., negligible) part of the
population the corresponding variable is random.” (Green and Tukey,
(4) “If an effect is assumed to be a realized value of a random
variable, it is called a random effect.” (LaMotte, 1983)
(5) Fixed effects are estimated using least squares (or, more
generally, maximum likelihood) and random effects are estimated with
shrinkage (“linear unbiased prediction” in the terminology of
Robinson, 1991). This definition is standard in the multilevel
modeling literature (see, for example, Snijders and Bosker, 1999,
Section 4.2) and in econometrics.
and notices that they are not consistent. In his book Data Analysis Using Regression and Multilevel/Hierarchical Models he generally avoids using those terms and in their work he focuses on fixed or varying between groups intercepts and slopes because
Fixed effects can be viewed as special cases of random effects, in
which the higher-level variance (in model (1.1), this would be
σ2α ) is set to 0 or ∞. Hence, in our framework,
all regression parameters are “random,” and the term “multilevel” is
This is especially true with Bayesian framework – commonly used for mixed models – where all the effects are random per se. If you are thinking Bayesian, you are not really concerned with “fixed” effects and point estimates and have no problem with treating all the effects as random.
The more I read on this topic, the more I am convinced that this is rather an ideological discussion on what we can (or should) estimate and what we only can predict (here I could refer also to your own answer). You use random effects if you have a random sample of possible outcomes, so you are not concerned about individual estimates and you care rather about the population effects, then individuals. So the answer of your question depends also on what do you think about if you want or can estimate the fixed effects given your data. If all the possible levels are included in your data you can estimate fixed effects – also, like in your example, the number of levels could be small and that would generally not be good for estimating random effects and there are some minimal requirements for this.
Best case scenario argument
Say you have unlimited amounts of data and unlimited computational power. In this case you could imagine estimating every effect as fixed, since fixed effects give you more flexibility (enable us to compare the individual effects). However, even in this case, most of us would be reluctant to use fixed effects for everything.
For example, imagine that you want to model exam results of schools in some region and you have data on all the 100 schools in the region. In this case you could threat schools as fixed – since you have data on all the levels – but in practice you probably would rather think of them as random. Why is that?
One reason is that generally in this kind of cases you are not interested in effects of individual schools (and it is hard to compare all of them), but rather a general variability between schools.
Another argument in here is model parsimony. Generally you are not interested in “every possible influence” model, so in your model you include few fixed effects that you want to test and control for the other possible sources of variability. This makes mixed effects models fit the general way of thinking about statistical modeling where you estimate something and control for other things. With complicated (multilevel or hierarchical) data you have many effects to include, so you threat some as “fixed” and some as “random” so to control for them.
In this scenario you also wouldn’t think of the schools as each having its own, unique, influence on the results, but rather as about schools having some influence in general. So this argument would be that we believe that is is not really possible to estimate the unique effects of individual schools and so we threat them as random sample of possible schools effects.
Mixed effects models are somewhere in between “everything fixed” and “everything random” scenarios. The data we encounter makes us to lower our expectations about estimate everything as fixed effects, so we decide what effects we want to compare and what effects we want to control, or have general feeling about their influence. It is not only about what the data is, but also how we think of the data while modeling it.