T-test, ANOVA or Regression, what’s the difference?

I know this question has been asked in similar ways already, but cannot find a suitable answer to understand it. I have three subsamples defined on programme participation (participants, drop-out, and comparison) and want to test for each of the groups separately whether the difference in means between the groups is significantly different from 0. So, overall I have three tests, mean1 = mean2, mean2 = mean3, mean1 = mean3

I read that using a paired t-test and a regression would result in the same, but that with ANOVA there is a slight difference? Does somebody know more about this and could suggest which one is best suited?



ANOVA vs t-tests

With ANOVA, you generally first perform an omnibus test. This is a test against the null-hypothesis that all group means are equal (μ1=μ2=μ3).

Only if there is sufficient evidence against this hypothesis, a post-hoc analysis can be run which is very similar to using 3 pairwise t-tests to check for individual differences. The most commonly used is called Tukey’s Honest Significant Difference (or Tukey’s HSD) and it has two important differences with a series of t-tests:

  • It uses the studentized range distribution instead of the t-distribution for p-values / confidence intervals;
  • It corrects for multiple testing by default.

The latter is the important part: Since you are testing three hypotheses, you have an inflated chance of at least one false positive. Multiple testing correction can also be applied to three t-tests, but with the ANOVA + Tukey’s HSD, this is done by default.

A third difference with separate t-tests is that you use all your data, not group per group. This can be advantageous, as it allows for easier diagnostics of the residuals. However, it also means you may have to resort to alternatives to the standard ANOVA in case variances are not approximately equal among groups, or another assumption is violated.

ANOVA vs Linear Regression

ANOVA is a linear regression with only additions to the intercept, no ‘slopes’ in the colloquial sense of the word. However, when you use linear regression with dummy variables for each of your three categories, you will achieve identical results in terms of parameter estimates.

The difference is in the hypotheses you would usually test with a linear regression. Remember, in ANOVA, the tests are: omnibus, then pairwise comparisons. In linear regression you usually test whether:

  • β0=0, testing whether the intercept is significantly non-zero;
  • βj=0, where j is each of your variables.

In case you only have one variable (group), one of its categories will become the intercept (i.e., the reference group). In that case, the tests performed by most statistical software will be:

  • Is the estimate for the reference group significantly non-zero?
  • Is the estimate for (group 1)(reference group) significantly non-zero?
  • Is the estimate for (group 2)(reference group) significantly non-zero?

This is nice if you have a clear reference group, because you can then simply ignore the (usually meaningless) intercept p-value and only correct the other two for multiple testing. This saves you some power, because you only correct for two tests instead of three.

So to summarize, if the group you call comparison is actually a control group, you might want to use linear regression instead of ANOVA. However, the three tests you say you want to do in your question resemble that of an ANOVA post-hoc or three pairwise t-tests.

Source : Link , Question Author : Papayapap , Answer Author : Maarten Punt

Leave a Comment