Reading about methods and results of statistical analysis, especially in epidemiology, I very often hear about adjustment or controlling of the models.
How would you explain, to a non-statistician, the purpose of that? How do you interpret your results after controlling for certain variable?
Small walk-through in Stata or R, or a pointer to one online, would a true gem.
Easiest to explain by way of an example:
Imagine study finds that people who watched the World Cup final were more likely to suffer a heart attack during the match or in the subsequent 24 hours than those who didn’t watch it. Should the government ban football from TV? But men are more likely to watch football than women, and men are also more likely to have a heart attack than women. So the association between football-watching and heart attacks might be explained by a third factor such as sex that affects both. (Sociologists would distinguish here between gender, a cultural construct that is associated with football-watching, and sex, a biological category that is associated with heart-attack incidence, but the two are cleary very strongly correlated so i’m going to ignore that distinction for simplicity.)
Statisticians, and especially epidemiologists, call such a third factor a confounder, and the phenomenon confounding. The most obvious way to remove the problem is to look at the association between football-watching and heart-attack incidence in men and women separately, or in the jargon, to stratify by sex. If we find that the association (if there still is one) is similar in both sexes, we may then choose to combine the two estimates of the association across the two sexes. The resulting estimate of the association between football-watching and heart-attack incidence is then said to be adjusted or controlled for sex.
We would probably also wish to control for other factors in the same way. Age is another obvious one (in fact epidemiologists either stratify or adjust/control almost every association by age and sex). Socio-economic class is probably another. Others can get trickier, e.g. should we adjust for beer consumption while watching the match? Maybe yes, if we’re interested in the effect of the stress of watching the match alone; but maybe no, if we’re considering banning broadcasting of World Cup football and that would also reduce beer consumption. Whether given variable is a confounder or not depends on precisely what question we wish to address, and this can require very careful thought and get quite tricky and even contentious.
Clearly then, we may wish to adjust/control for several factors, some of which may be measured in several categories (e.g. social class) while others may be continuous (e.g. age). We could deal with the continuous ones by splitting into (age-)groups, thereby turning them into categorical ones. So say we have 2 sexes, 5 social class groups and 7 age groups. We can now look at the association between football-watching and heart-attack incidence in 2×5×7 = 70 strata. But if our study is fairly small, so some of those strata contain very few people, we’re going to run into problems with this approach. And in practice we may wish to adjust for a dozen or more variables. An alternative way of adjusting/controlling for variables that is particularly useful when there are many of them is provided by regression analysis with multiple dependent variables, sometimes known as multivariable regression analysis. (There are different types of regression models depending on the type of outcome variable: least squares regression, logistic regression, proportional hazards (Cox) regression…). In observational studies, as opposed to experiments, we nearly always want to adjust for many potential confounders, so in practice adjustment/control for confounders is often done by regression analysis, though there are other alternatives too though, such as standardization, weighting, propensity score matching…