# You observe k heads out of n tosses. Is the coin fair?

I was asked this question with $(n, k) = (400, 220)$ in an interview. Is there a “correct” answer?

Assume the tosses are i.i.d. and the probability of heads is $p=0.5$. The distribution of the number of heads in 400 tosses should then be close to Normal(200, 10^2), so that 220 heads is 2 standard deviations away from the mean. The probability of observing such an outcome (i.e. more 2 SDs away from the mean in either direction) is slightly less than 5%.

The interviewer told me, essentially, “if I observe something >= 2 SDs from the mean, I conclude that something else is going on. I would bet against the coin being fair.” That’s reasonable — after all, that’s what most hypothesis tests do. But is that the end of the story? For the interviewer that seemed to be the “correct” answer. What I’m asking here is whether some nuance is justified.

I couldn’t help but point out that deciding that the coin is not fair is a bizarre conclusion in this coin-tossing context. Am I right to say that? I’ll try and explain below.

First of all, I — and I would assume most people as well — have a strong prior about coins: they’re very likely to be fair. Of course that depends on what we mean by fair — one possibility would be to define “fair” as “having a probability of heads ‘close’ to 0.5, say between 0.49 and 0.51.”

(You could also define ‘fair’ as meaning that the probability of heads is exactly 0.50, in which case having a perfectly fair coin now seems rather unlikely.)

Your prior might depend not only on your general beliefs about coins but also on the context. If you pulled the coin out of your own pocket, you might be virtually certain that it’s fair; if your magician friend pulled it out of his, your prior might put more weight on double-headed coins.

In any case, it’s easy to come up with reasonable priors that (i) put a large probability on the coin being fair and (ii) lead your posterior to be quite similar, even after observing 220 heads. You’d then conclude that the coin was very likely to be fair, despite observing an outcome 2 SDs from the mean.

In fact, you could also construct examples where observing 220 heads in 400 tosses makes your posterior put more weight on the coin being fair, for example if all unfair coins have a probability of heads in $\{0, 1\}$.

Can anyone shed some light on this for me?

Whuber put a very interesting link in the comments: You Can Load a Die, But You Can’t Bias a Coin. From page 3:

It does not make sense to say that the coin has a probability p of
heads, because it can be completely determined by the manner in which
it is tossed— unless it is tossed high in the air with a rapid spin
and caught in the air with no bouncing, in which case p = 1/2.

Pretty cool! This ties into my question in an interesting way: suppose we know that the coin is being “tossed high in the air with a rapid spin and caught in the air with no bouncing.” Then we definitely shouldn’t reject the hypothesis that the coin is fair (where “fair” now means “having p=1/2 when tossed in the way described above”), because we effectively have a prior that puts all probability on the coin being fair. Maybe that justifies to some degree why I am uncomfortable rejecting the null after 220 heads are observed.