Explaining two-tailed tests

I am looking for various ways of explaining to my students (in an elementary statistics course) what is a two tailed test, and how its P value is calculated.

How do you explain to your students the two- vs one- tailed test?

This is a great question and I’m looking forward to everyones version of explaining the p-value and the two-tailed v.s. one-tailed test. I’ve been teaching fellow orthopaedic surgeons statistics and therefore I tried to keep it as basic as possible since most of them haven’t done any advanced math for 10-30 years.

My way of explaining calculating p-values & the tails

I start with a explaining that if we believe that we have a fair coin we know it should end up tails 50 % of the flips on average (\$=H_0\$). Now if you wonder what the probability of getting only 2 tails out of 10 flips with this fair coin you can calculate that probability as I’ve done in the bar graph. From the graph you can see that the probability of getting 8 out of 10 flips with a fair coin is about about \$\approx 4.4\%\$.

Since we would question the fairness of the coin if we got 9 or 10 tails we have to include these possibilities, the tail of the test. By adding the values we get that the probability now is a little more than \$\approx 5.5\%\$ of getting 2 tails or less.

Now if we would get only 2 heads, ie 8 heads (the other tail), we would probably be just as willing to question the fairness of the coin. This means that you end up with a probability of \$5.4…\%+5.4…\% \approx 10.9\%\$ for a two-tailed test.

Since we in medicine usually are interested in studying failures we need to include the opposite side of the probability even if our intent is to do good and to introduce a beneficial treatment.

Reflections slightly out of topic

This simple example also shows how dependent we are on the null hypothesis to calculate the p-value. I also like to point out the resemblance between the binomial curve and the bell curve. When changing into 200 flips you get a natural way of explaining why the probability of getting exactly 100 flips starts to lack relevance. The defining intervals of interest is a natural transition to probability density/mass function functions and their cumulative counterparts.

In my class I recommend them the Khan academy statistics videos and I also use some of his explanations for certain concepts. They also get to flip coins where we look into the randomness of the coin flipping – the thing that I try to show is that randomness is more random than what we usually believe inspired by this Radiolab episode.

The code

I usually have one graph/slide, the R-code that I used to create the graph:

``````library(graphics)

binom_plot_function <- function(x_max, my_title = FALSE, my_prob = .5, edges = 0,
col=c("green", "gold", "red")){
barplot(
dbinom(0:x_max, x_max, my_prob)*100,
col=c(rep(col[1], edges), rep(col[2], x_max-2*edges+1), rep(col[3], edges)),
#names=0:x_max,
ylab="Probability %",
xlab="Number of tails", names.arg=0:x_max)
if (my_title != FALSE ){
title(main=my_title)
}
}

binom_plot_function(10, paste("Flipping coins", 10, "times"), edges=0, col=c("#449944", "gold", "#994444"))
binom_plot_function(10, edges=3, col=c(rgb(200/255, 0, 0), "gold", "gold"))
binom_plot_function(10, edges=3, col=c(rgb(200/255, 0, 0), "gold", rgb(200/255, 100/255, 100/255)))
``````