I am currently working on a regression model where I have only categorical/factor variables as independent variables. My dependent variable is a logit transformed ratio.

It is fairly easy just to run a normal regression in R, as R automatically know how to code dummies as soon as they are of the type “factor”. However this type of coding also implies that one category from each variable is used as a baseline, making it hard to interpret.

My professor have told me to just use effect coding instead (-1 or 1), as this implies the use of the grand mean for the intercept.

Does anyone know how to handle that?

Until now I have tried:

`gm <- mean(tapply(ds$ln.crea, ds$month, mean)) model <- lm(ln.crea ~ month + month*month + year + year*year, data = ds, contrasts = list(gm = contr.sum)) Call: lm(formula = ln.crea ~ month + month * month + year + year * year, data = ds, contrasts = list(gm = contr.sum)) Residuals: Min 1Q Median 3Q Max -0.89483 -0.19239 -0.03651 0.14955 0.89671 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.244493 0.204502 -15.865 <2e-16 *** monthFeb -0.124035 0.144604 -0.858 0.3928 monthMar -0.365223 0.144604 -2.526 0.0129 * monthApr -0.240314 0.144604 -1.662 0.0993 . monthMay -0.109138 0.144604 -0.755 0.4520 monthJun -0.350185 0.144604 -2.422 0.0170 * monthJul 0.050518 0.144604 0.349 0.7275 monthAug -0.206436 0.144604 -1.428 0.1562 monthSep -0.134197 0.142327 -0.943 0.3478 monthOct -0.178182 0.142327 -1.252 0.2132 monthNov -0.119126 0.142327 -0.837 0.4044 monthDec -0.147681 0.142327 -1.038 0.3017 year1999 0.482988 0.200196 2.413 0.0174 * year2000 -0.018540 0.200196 -0.093 0.9264 year2001 -0.166511 0.200196 -0.832 0.4073 year2002 -0.056698 0.200196 -0.283 0.7775 year2003 -0.173219 0.200196 -0.865 0.3887 year2004 0.013831 0.200196 0.069 0.9450 year2005 0.007362 0.200196 0.037 0.9707 year2006 -0.281472 0.200196 -1.406 0.1625 year2007 -0.266659 0.200196 -1.332 0.1855 year2008 -0.248883 0.200196 -1.243 0.2164 year2009 -0.153083 0.200196 -0.765 0.4461 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3391 on 113 degrees of freedom Multiple R-squared: 0.3626, Adjusted R-squared: 0.2385 F-statistic: 2.922 on 22 and 113 DF, p-value: 0.0001131`

**Answer**

In principle, there are two types of contrast coding, with which the intercept will estimate the Grand Mean. These are *sum contrasts* and *repeated contrasts* (sliding differences).

Here’s an example data set:

```
set.seed(42)
x <- data.frame(a = c(rnorm(100,2), rnorm(100,1),rnorm(100,0)),
b = rep(c("A", "B", "C"), each = 100))
```

The conditions’ means:

```
tapply(x$a, x$b, mean)
A B C
2.03251482 0.91251629 -0.01036817
```

The Grand Mean:

```
mean(tapply(x$a, x$b, mean))
[1] 0.978221
```

You can specify the type of contrast coding with the `contrasts`

parameter in `lm`

.

**Sum contrasts**

```
lm(a ~ b, x, contrasts = list(b = contr.sum))
Coefficients:
(Intercept) b1 b2
0.9782 1.0543 -0.0657
```

The intercept is the Grand Mean. The first slope is the difference between the first factor level and the Grand Mean. The second slope is the difference between the second factor level and the Grand Mean.

**Repeated contrasts**

The function for creating repeated contrasts is part of the `MASS`

package.

```
lm(a ~ b, x, contrasts = list(b = MASS::contr.sdif))
Coefficients:
(Intercept) b2-1 b3-2
0.9782 -1.1200 -0.9229
```

The intercept is the Grand Mean. The slopes indicate the differences between consecutive factor levels (2 vs. 1, 3 vs. 2).

**Attribution***Source : Link , Question Author : Kasper Christensen , Answer Author : Sven Hohenstein*