Should “City” be a fixed or a random effect variable?

I am analyzing data on “BloodSugar” level (dependent variable) and trying to find its relation with “age”, “gender” and “weight” (independent variables) of subjects. I have data collected from subjects sampled in four “city”.

Should I use “city” variable as fixed effect or a random effect?

So which is correct:

lm(bloodsugar ~ age + gender + weight + city, mydata)

or:

lmer(bloodsugar ~ age + gender + weight + (1|city), mydata)

Thanks for your help.

Edit: In response to comment by @Dave , I would like to add following: Currently there is no data on relation between my real dependent variable and City. So, relation could be there. Relation with City is not my primary objective but it will be nice to determine that relation also, if it is feasible by proper statistical methods.

Answer

I would advise fitting both. Hopefully they will tell you the same thing. If not, that would be very interesting!

Conceptually, city should be random. You are not specifically interested in estimates for each city for you research question and your sample of cities can be thought of as coming from a wider population of cities. These are good reasons to treat it as random.

The problem is you only have 4 of them so you are asking the software to estimate a variance for a normally distributed variable with only 4 samples so that may not be very reliable.

It is perfectly valid to fit fixed effects and this will control for the non independence within each city. In that case you are treating it a bit like a confounder. The reason for using random intercepts is that with many cities this becomes inconvenient and loses statistical power.

So with only 4, I would do both.

Attribution
Source : Link , Question Author : rnso , Answer Author : Robert Long

Leave a Comment