# How to explain hypothesis testing for teenagers in less than 10 minutes?

For over a year now I’ve been giving a one-hour “a taste for statistics” class. Each time I get a different group of kids coming over, and I give them the class.

The theme of the class is that we run an experiment in which 10 kids (who likes drinking coca-cola) are given two (unmarked) cups, one with coca-cola and one with pepsi. The kids are asked to detect, based on taste and smell, which cup has the coca-cola drink.

I then need to explain to them how to decide if the kids are guessing, or if they (or at least, enough of them) really have the ability to taste the difference. Are 10 out of 10 successes good enough? what about 7 out of 10?

Even after giving this class tens of times (in different variations), I still don’t feel I know how to get the concept across in a way that most of the class will get it.

If you have any ideas on how the concept of hypothesis testing, null hypothesis, alternative hypothesis, rejection regions, etc. can be explained in a simple(!) and intuitive way – I would love to know how.

I think you should start with asking them what they think it really means to say about a person that he or she is able to tell the difference between coca-cola and pepsi. What can such a person do that others can not do?

Most of them will not have any such definition, and will not be able to produce one if asked. However, a meaning of that phrase is what statistics gives us, and that is what you can bring with your “a taste for statistics” class.

One of the points of statistics is to give an exact answer to the question: “what does it mean to say of someone that he or she is able to tell the difference between coca-cola and pepsi”

The answer is: he or she is better than a guessing-machine to classify cups in a blind test. The guessing machine can not tell the difference, it simply guesses all the time. The guessing machine is a useful invention for us because we know that it does not have the ability. The results of the guessing machine are useful because they show what we should expect from someone who lacks the ability that we test for.

To test whether a person is able to tell the difference between coca-cola and pepsi, one must compare his or hers classifications of cups in a blind test to the classification that a guessing machine would do. Only if s/he is better than the guessing machine, s/he is able to tell the difference.

How, then, do you determine whether one result is better than another result? What if they are almost the same?

If two persons classify a small number of cups, it’s not really fair to say that one is better than the other if the results are almost the same. Perhaps the winner just happened to be lucky today, and the results would have been reversed if the competition was repeated tomorrow?

If we are to have a trustworthy result, it can not be based on a tiny number of classifications, because then chance can decide the result. Remember, you don’t have to be perfect to have the ability, you just have to be better than the guessing machine. In fact, if the number of classifications is too small, not even a person that always identifies coca-cola correctly will be able to show that s/he is better than the guessing machine. For example, if there is only one cup to classify, even the guessing machine will have 50 per cent chance to classify completely correct. That’s not good, because that means that in 50 per cent of the trials, we would falsely conclude that a good coca-cola identifier is no better than the guessing machine. Very unfair.

The more cups there are to classify, the more opportunities for the guessing machine’s inability to be revealed and the more opportunities for the good coca-cola identifier to show off.

10 cups might be a good place to start. How many right answers must a human then have to show that he or she is better than the machine?

Ask them what they would guess.

Then let them use the machine and find out how good it is, i.e. let all pupils generate a series of ten guesses, eg. using a dice or a random generator on the smartphone. To be pedagogical, you should prepare a series of ten right answers, which the guesses are to be evaluated against.

Record all the results on the board. Print the sorted results on the board. Explain that a human would have to be better than 95 per cent of those results before a statistician would acknowledge his or her ability to tell the difference between coca-cola and pepsi. Draw the line that separates the 95% worst results from the top 5% results.

Then, let a few pupils try classifying 10 cups. By now the pupils should know how many right they need to have to prove that they can tell the difference.

All this is not really doable in 10 minutes though.