I wanted to understand fisher exact test better, so I devised up the following toy example, where f and m corresponds to male and female, and n and y corresponds to “soda consumption” like this:

`> soda_gender f m n 0 5 y 5 0`

Obviously, this is a drastic simplification, but I didn’t want the context to get in the way. Here I just assumed that males don’t drink soda and females drink soda, and wanted to see if the statistical procedures come to the same conclusion.

When I run the fisher exact test in R, I get the following results:

`> fisher.test(soda_gender) Fisher's Exact Test for Count Data data: soda_gender p-value = 0.007937 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.0000000 0.4353226 sample estimates: odds ratio 0`

Here, since p-value is 0.007937, we would conclude that gender and soda consumption are associated.

I know that fisher-exact test is related to hypergeomteric distribution. So I wanted to get the similar results using that. In other words, you can view this problems as following : there are 10 balls, where 5 are labeled as “male”, and 5 are labeled as

“female”, and you draw 5 balls randomly without replacement, and you see 0 male balls. What is the chance of this observation? To answer this question, I used the following command:`> phyper(q=0,m=5,n=5,k=5,lower.tail=TRUE) [1] 0.003968254`

My questions are:

1) How come the two results are different?

2) Is there anything incorrect or not rigorous in my reasoning above?

**Answer**

Fisher’s exact test works by conditioning upon the table margins (in this case, 5 males and females and 5 soda drinkers and non-drinkers). Under the assumptions of the null hypothesis, the cell probabilities for observing a male soda drinker, male non-soda drinker, female soda drinker, or female non-soda drinker are all equally likely (0.25) because of the margin totals.

The particular table you used for the FET has no table aside from its converse, 5 female non-soda drinkers and 5 male soda drinkers, which is “at least as unlikely” under the null hypothesis. So you’ll notice that doubling the probability you obtained in your hypergeometric density gives you the FET p-value.

**Attribution***Source : Link , Question Author : Alby , Answer Author : AdamO*