# Why does treatment coding result in a correlation between random slope and intercept?

Consider a within-subject and within-item factorial design where the experimental treatment variable has two levels (conditions). Let `m1` be the maximal model and `m2` the no-random-correlations model.

``````m1: y ~ condition + (condition|subject) + (condition|item)
m2: y ~ condition + (1|subject) + (0 + condition|subject) + (1|item) + (0 + condition|item)
``````

Dale Barr states the following for this situation:
Edit (4/20/2018): As Jake Westfall pointed out, the following statements seem to refer to the datasets that are shown in Fig. 1 and 2 on this website only. However, the keynote remains the same.

In a deviation-coding representation (condition: -0.5 vs. 0.5) `m2` allows distributions, where subject’s random intercepts are uncorrelated with subject’s random slopes. Only a maximal model `m1` allows distributions, where the two are correlated.

In the treatment-coding representation (condition: 0 vs. 1) these distributions, where subject’s random intercepts are uncorrelated with subject’s random slopes, cannot be fitted using the no-random-correlations model, since in each case there is a correlation between random slope and intercept in the treatment-coding representation.

Why does treatment coding always result in a correlation between random slope and intercept?

Treatment coding doesn’t always or necessarily result in intercept/slope correlation, but it tends to more often than not. It’s easiest to see why this is the case using pictures, and considering the case of a continuous rather than categorical predictor.

Here’s a picture of a normal-looking clustered dataset with approximately 0 correlation between the random intercepts and random slopes: But now look what happens when shift the predictor X far to the right by adding 3 to each X value: It’s the same dataset in a deep sense — if we zoomed in on the data points it would look identical to the first plot, but with the X axis relabeled — but simply by shifting X we’ve induced an almost perfect negative correlation between the random intercepts and random slopes. This happens because when we shift X, we redefine the intercepts of each group. Remember that the intercepts always refer to the Y-values where the group-specific regression lines cross X=0. But now the X=0 point is far away from the center of the data. So we’re essentially extrapolating outside the range of the observed data in order to compute the intercepts. The result, as you can see, is that the greater the slope is, the lower the intercept is, and vice versa.

When you use treatment coding, it’s like doing a less extreme version of the X-shifting depicted in the bottom graph. This is because the treatment codes {0,1} are just a shifted version of the deviation codes {-0.5, 0.5}, where a shift of +0.5 has been added. Edit 2018-08-29: this is now illustrated more clearly and directly in the second figure of this more recent answer of mine to another question.

Like I said earlier, this is not true by necessity. It’s possible to have a dataset similar to the above, but where the slopes and intercepts are uncorrelated on the shifted scale (where the intercepts refer to points far away from the data) and correlated on the centered scale. But the group-specific regression lines in such datasets will tend to exhibit “fanning out” patterns that, in practice, are just not that common in the real world.