# Article about misuse of statistical method in NYTimes

Consider the following experiment. Suppose there was reason to believe that a coin was slightly weighted toward heads. In a test, the coin comes up heads 527 times out of 1,000.

Is this significant evidence that the
coin is weighted?

Classical analysis says yes. With a
fair coin, the chances of getting 527
or more heads in 1,000 flips is less
than 1 in 20, or 5 percent, the
conventional cutoff. To put it another
way: the experiment finds evidence of
a weighted coin “with 95 percent
confidence.”

Yet many statisticians do not buy it.
One in 20 is the probability of
getting any number of heads above 526
in 1,000 throws. That is, it is the
sum of the probability of flipping
527, the probability of flipping 528,
529 and so on.

But the experiment did not find all of
the numbers in that range; it found
just one — 527. It is thus more
accurate, these experts say, to
calculate the probability of getting
that one number — 527 — if the coin is
weighted, and compare it with the
probability of getting the same number
if the coin is fair.

Statisticians can show that this ratio
cannot be higher than about 4 to 1,
according to Paul Speckman, a
statistician, who, with Jeff Rouder, a
psychologist, provided the example.

First question: This is new to me. Has anybody a reference where I can find the exact calculation and/or can YOU help me by giving me the exact calculation yourself and/or can you point me to some material where I can find similar examples?

Bayes devised a way to update the
probability for a hypothesis as new
evidence comes in.

So in evaluating the strength of a
given finding, Bayesian (pronounced
BAYZ-ee-un) analysis incorporates
known probabilities, if available,
from outside the study.

It might be called the “Yeah, right”
effect. If a study finds that kumquats
reduce the risk of heart disease by 90
percent, that a treatment cures
alcohol addiction in a week, that
sensitive parents are twice as likely
to give birth to a girl as to a boy,
the Bayesian response matches that of
the native skeptic: Yeah, right. The
study findings are weighed against
what is observable out in the world.

In at least one area of medicine —
diagnostic screening tests —
probabilities to evaluate new
findings. For instance, a new
lie-detection test may be 90 percent
accurate, correctly flagging 9 out of
10 liars. But if it is given to a
population of 100 people already known
to include 10 liars, the test is a lot
less impressive.

It correctly identifies 9 of the 10
liars and misses one; but it
incorrectly identifies 9 of the other
90 as lying. Dividing the so-called
true positives (9) by the total number
of people the test flagged (18) gives
an accuracy rate of 50 percent. The
“false positives” and “false
negatives” depend on the known rates
in the population.

Second question: How do you exactly judge if a new finding is “real” or not with this method? And: Isn’t this as arbitrary as the 5%-barrier because of the use of some preset prior probability?

I will answer the first question in detail.

With a fair coin, the chances of
getting 527 or more heads in 1,000
flips is less than 1 in 20, or 5
percent, the conventional cutoff.

For a fair coin the number of heads in 1000 trials follows the binomial distribution with number of trials $n=1000$ and probability $p=1/2$. The probability of getting more than 527 heads is then

$$P(B(1000,1/2)>=527)$$

This can be calculated with any statistical software package. R gives us

> pbinom(526,1000,1/2,lower.tail=FALSE)
0.04684365


So the probability that with fair coin we will get more than 526 heads is approximately 0.047, which is close to 5% cuttoff mentioned in the article.

The following statement

To put it another way: the experiment
finds evidence of a weighted coin
“with 95 percent confidence.”

is debatable. I would be reluctant to say it, since 95% confidence can be interpreted in several ways.

Next we turn to

But the experiment did not find all of
the numbers in that range; it found
just one — 527. It is thus more
accurate, these experts say, to
calculate the probability of getting
that one number — 527 — if the coin is
weighted, and compare it with the
probability of getting the same number
if the coin is fair.

Here we compare two events $B(1000,1/2)=527$ — fair coin, and $B(1000,p)=527$ — weighted coin. Substituting the formulas for probabilities of these events and noting that the binomial coefficient cancels out we get

$$\frac{P(B(1000,p)=527)}{P(B(1000,1/2)=527)}=\frac{p^{527}(1-p)^{473}}{(1/2)^{1000}}.$$

This is a function of $p$, thus we cand find minima or maxima of it. From the article we may infer that we need maxima:

Statisticians can show that this ratio
cannot be higher than about 4 to 1,
according to Paul Speckman, a
statistician, who, with Jeff Rouder, a
psychologist, provided the example.

To make maximisation easier take logarithm of ratio, calculate the derivative with respect to $p$ and equate it to zero. The solution will be

$$p=\frac{527}{1000}.$$

We can check that it is really a maximum using second derivative test for example. Substituting it to the formula we get

$$\frac{(527/1000)^{527}(473/1000)^{473}}{(1/2)^{1000}}\approx 4.3$$

So the ratio is 4.3 to 1, which agrees with the article.