# Dummy coding for contrasts: 0,1 vs. 1,-1

I’m seeking your help in understanding the difference between two different contrasts for dichotomous variables.

http://www.psychstat.missouristate.edu/multibook/mlt08.htm
under “Dichotomous Predictor Variables”, there are two ways to code dichotomous predictors: using the contrast 0,1 or the contrast 1,-1. I kind of understand the distinction here (0,1 is dummy coding and 1,-1 adds to one group and subtracts from the other) but don’t understand which to use in my regression.

For example, if I have two dichotomous predictors, gender (m/f) and athlete (y/n), I could use contrasts 0,1 on both or 1,-1 on both. What would be the interpretation of a main effect or an interaction effect when using the two different contrasts? Does it depend on whether my cells are of different sizes?

“Dichotomous Predictor Variables”, there are two ways to code dichotomous predictors: using the contrast 0,1 or the contrast 1,-1.

This is factually wrong. There is no limit to the number of ways they can be coded. Those two are merely the most common (indeed between them, almost ubiquitous), and probably the easiest to deal with.

I kind of understand the distinction here (0,1 is dummy coding and 1,-1 adds to one group and subtracts from the other) but don’t understand which to use in my regression.

Whichever is more convenient/appropriate. If you have a designed experiment with equal numbers in each, there are some nice aspects to the second approach; if you don’t the first is probably easier in several ways.

For example, if I have two dichotomous predictors, gender (m/f) and athlete (y/n), I could use contrasts 0,1 on both or 1,-1 on both.

What would be the interpretation of a main effect or an interaction effect when using the two different contrasts?

a) (i) Consider a gender main effect (without interaction for simplicity) {m=0, f=1} – then the coefficient corresponding to that dummy will measure the difference in mean between females and males (and the intercept would be the mean of the males).

(ii) For {m=-1, f=1} the gender main effect is half the difference in mean, and the intercept is the average of the means (if the design is balanced it is also the average of all the data). Equivalently, the the main effect is the difference of each group mean from the intercept.

b) (i) consider an interaction between gender{m=0,f=1} and athlete {n=0,y=1}

Now the intercept represents the mean of the male non-athletes (0,0), the gender main effect is the difference between the means of the female non-athletes and male non-athletes, the athlete main effect represents the difference between the mean of the male athletes and the male non-athletes and the interaction is the difference of two differences – it’s the mean athlete/non-athlete difference for females minus the mean athlete/non-athlete difference for makes.

(ii) consider an interaction between gender{m=-1,f=-1} and athlete {n=-1,y=1}

Now the intercept represents the mean of the four group-means (and if the design was completely balanced it would also be the overall mean). The intercept is a quarter of what it was before.

The main effects are averages of difference effects – the gender effect is the average of the female-male difference within atheletes and the female-male difference within non-athletes. The athlete main effect is the average of the athlete/non-athlete difference within females and the athlete/non-athlete difference within males.

Does it depend on whether my cells are of different sizes?

What do you mean by ‘different sizes’? Do you mean that the number of observations in each cell are different? (If so, I largely addressed that above – equal cell numbers gives additional meanings/simplifies the interpretation, such as making the intercept the the grand mean of the data rather than just the mean of group means.)