The difference between average and marginal treatment effect

I have been reading some papers, and I am unclear about the specific definitions of Average Treatment Effect (ATE), and Marginal Treatment Effect (MTE). Are they the same?

According to Austin

A conditional effect is the average effect, at the subject level, of moving a subject from untreated
to treated. The regression coefficient for a treatment assignment indicator variable from a multivariable regression model is an estimate of a conditional or adjusted effect. In contrast, a marginal effect is the average effect, at the population level, of moving an entire population from untreated to treated [10]. Linear treatment effects (differences in means and differences in proportions) are collapsible: the conditional and marginal treatment effects will coincide. However, when outcomes are binary or time to event in nature, the odds ratio and the hazard ratio are not collapsible [11]. Rosenbaum has noted that propensity score methods allow one to estimate marginal, rather than conditional, effects [12]. There is a paucity of research into the performance of different propensity score methods to estimate marginal treatment effects.

But in another Austin paper, he says

For each subject, the effect of treatment is defined to be Yi(1)Yi(0).
The average treatment effect (ATE) is defined to be E[Yi(1)Yi(0)]. (Imbens, 2004). The ATE is the average effect, at the population level, of moving an entire population from untreated to treated.

So the question that I have is…What is the difference between the average treatment effect and the marginal treatment effect?

As well, how should I classify my estimand? I have a propensity score weighted (IPTW) Cox model. My only covariate is the treatment indicator. Should the resulting hazard ratio be considered the ATE or the MTE?

Edit: To add to the confusion, Guo, in his book propensity score analysis claims that the marginal treatment effect is

…special case of the treatment effect for the people at the margin of indifference (EOTM). In some policy and practice situations, it is important to distinguish between the marginal and average returns. For instance, the average student going to college may do better (i.e. have higher grades) than the marginal student who is indifferent about going to school or not.

I feel like this should be taken with a grain of salt, because this is directed for social sciences (where I believe marginal has a different definition), but I thought I would include it here to display why I am confused.


As some of the information you provided states, the two are not the same. I like better the terminology of conditional (on covariates) and unconditional (marginal) estimates. There is a very subtle language problem that clouds the issue greatly. Analysts who tend to love “population average effects” have a dangerous tendency to try to estimate such effects from a sample with no reference to any population distribution of subject characteristics. In this sense the estimates should not be called population average estimates but instead should be called sample average estimates. It is very important to note that sample average estimates have a low chance of being transportable to the population from which the sample came or in fact to any population. One reason for this is the somewhat arbitrary selection criteria for how subjects get into studies.

As an example, if one compared treatment A and treatment B in a binary logistic model adjusted for sex, one obtains a treatment effect that is specific to both males and females. If the sex variable is omitted from the model, a sample average odds ratio effect for treatment is obtained. This in effect is a comparison of some of the males on treatment A with some of the females on treatment B, due to non-collapsibility of the odds ratio. If one had a population with a different female:male frequency, this average treatment effect coming from a marginal odds ratio for treatment, will no longer apply.

So if one wants a quantity that pertains to individual subjects, full conditioning on covariates is required. And these conditional estimates are the ones that transport to populations, not the so-called “population average” estimates.

Another way to think about it: think of an ideal study for comparing treatment to no treatment. This would be a multi-period randomized crossover study. Then think about the next best study: a randomized trial on identical twins where one of the twins in each pair is randomly selected to get treatment A and the other is selected to get treatment B. Both of these ideal studies are mimicked by full conditioning, i.e., full covariate adjustment to get conditional and not marginal effects from the more usual parallel group randomized controlled trial.

Source : Link , Question Author : RayVelcoro , Answer Author : Frank Harrell

Leave a Comment