Reproduce figure of “Computer Age Statistical Inference” from Efron and Hastie

The summarized version of my question

(26th December 2018)

I am trying to reproduce Figure 2.2 from Computer Age Statistical Inference by Efron and Hastie, but for some reason that I’m not able to understand, the numbers are not corresponding with the ones in the book.

Assume we are trying to decide between two possible probability density functions for the observed data $$xx$$, a null hypothesis density $$f0(x)f_0\left(x\right)$$ and an alternative density $$f1(x)f_1\left(x\right)$$. A testing rule $$t(x)t\left(x\right)$$ says which choice, $$00$$ or $$11$$, we will make having observed data $$xx$$. Any such rule has two associated frequentist error probabilities: choosing $$f1f_1$$ when actually $$f0f_0$$ generated $$xx$$, and vice versa,

$$α=Prf0{t(x)=1}, \alpha = \text{Pr}_{f_0} \{t(x)=1\},$$
$$β=Prf1{t(x)=0}. \beta = \text{Pr}_{f_1} \{t(x)=0\}.$$

Let $$L(x)L(x)$$ be the likelihood ratio,

$$L(x)=f1(x)f0(x) L(x) = \frac{f_1\left(x\right)}{f_0\left(x\right)}$$

So, the Neyman–Pearson lemma says that the testing rule of the form $$tc(x)t_c(x)$$ is the optimum hypothesis-testing algorithm

$$tc(x)={1if log L(x)≥c0if log L(x)

For $$f0∼N(0,1),f1∼N(0.5,1)f_0 \sim \mathcal{N} \left(0,1\right), \enspace f_1 \sim \mathcal{N} \left(0.5,1\right)$$, and sample size $$n=10n=10$$ what would be the values for $$α\alpha$$ and $$β\beta$$ for a cutoff $$c=0.4c=0.4$$?

• From Figure 2.2 of Computer Age Statistical Inference by Efron and Hastie we have:
• $$α=0.10\alpha=0.10$$ and $$β=0.38\beta=0.38$$ for a cutoff $$c=0.4c=0.4$$
• I found $$α=0.15\alpha=0.15$$ and $$β=0.30\beta=0.30$$ for a cutoff $$c=0.4c=0.4$$ using two different approaches: A) simulation and B) analytically.

I would appreciate if someone could explain to me how to obtain $$α=0.10\alpha=0.10$$ and $$β=0.38\beta=0.38$$ for a cutoff $$c=0.4c=0.4$$. Thanks.

The summarized version of my question finishes here. From now you will find:

• In section A) details and complete python code of my simulation approach.
• In section B) details and complete python code of the analytically approach.

A) My simulation approach with complete python code and explanations

(20th December 2018)

From the book …

In the same spirit, the Neyman–Pearson lemma provides an optimum hypothesis-testing algorithm. This is perhaps the most elegant of frequentist constructions. In its simplest formulation, the NP lemma assumes we are trying to decide between two possible probability density functions for the observed data $$xx$$, a null hypothesis density $$f0(x)f_0\left(x\right)$$ and an alternative density $$f1(x)f_1\left(x\right)$$. A testing rule $$t(x)t\left(x\right)$$ says which choice, $$00$$ or $$11$$, we will make having observed data $$xx$$. Any such rule has two associated frequentist error probabilities: choosing $$f1f_1$$ when actually $$f0f_0$$ generated $$xx$$, and vice versa,

$$α=Prf0{t(x)=1}, \alpha = \text{Pr}_{f_0} \{t(x)=1\},$$
$$β=Prf1{t(x)=0}. \beta = \text{Pr}_{f_1} \{t(x)=0\}.$$

Let $$L(x)L(x)$$ be the likelihood ratio,

$$L(x)=f1(x)f0(x) L(x) = \frac{f_1\left(x\right)}{f_0\left(x\right)}$$

(Source: Efron, B., & Hastie, T. (2016). Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge: Cambridge University Press.)

So, I implemented the python code below …

import numpy as np

def likelihood_ratio(x, f1_density, f0_density):
return np.prod(f1_density.pdf(x)) / np.prod(f0_density.pdf(x))


Again, from the book …

and define the testing rule $$tc(x)t_c(x)$$ by

$$tc(x)={1if log L(x)≥c0if log L(x)

(Source: Efron, B., & Hastie, T. (2016). Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge: Cambridge University Press.)

So, I implemented the python code below …

def Neyman_Pearson_testing_rule(x, cutoff, f0_density, f1_density):
lr = likelihood_ratio(x, f1_density, f0_density)
llr = np.log(lr)

if llr >= cutoff:
return 1
else:
return 0


Finally, from the book …

Where it is possible to conclude that a cutoff $$c=0.4c=0.4$$ will imply $$α=0.10\alpha=0.10$$ and $$β=0.38\beta=0.38$$.

So, I implemented the python code below …

def alpha_simulation(cutoff, f0_density, f1_density, sample_size, replicates):
NP_test_results = []

for _ in range(replicates):
x = f0_density.rvs(size=sample_size)
test = Neyman_Pearson_testing_rule(x, cutoff, f0_density, f1_density)
NP_test_results.append(test)

return np.sum(NP_test_results) / float(replicates)

def beta_simulation(cutoff, f0_density, f1_density, sample_size, replicates):
NP_test_results = []

for _ in range(replicates):
x = f1_density.rvs(size=sample_size)
test = Neyman_Pearson_testing_rule(x, cutoff, f0_density, f1_density)
NP_test_results.append(test)

return (replicates - np.sum(NP_test_results)) / float(replicates)


and the code …

from scipy import stats as st

f0_density = st.norm(loc=0, scale=1)
f1_density = st.norm(loc=0.5, scale=1)

sample_size = 10
replicates = 12000

cutoffs = []
alphas_simulated = []
betas_simulated = []
for cutoff in np.arange(3.2, -3.6, -0.4):
alpha_ = alpha_simulation(cutoff, f0_density, f1_density, sample_size, replicates)
beta_ = beta_simulation(cutoff, f0_density, f1_density, sample_size, replicates)

cutoffs.append(cutoff)
alphas_simulated.append(alpha_)
betas_simulated.append(beta_)


and the code …

import matplotlib.pyplot as plt
%matplotlib inline

# Reproducing Figure 2.2 from simulation results.
plt.xlabel('$$\\alpha$$')
plt.ylabel('$$\\beta$$')
plt.xlim(-0.1, 1.05)
plt.ylim(-0.1, 1.05)
plt.axvline(x=0, color='b', linestyle='--')
plt.axvline(x=1, color='b', linestyle='--')
plt.axhline(y=0, color='b', linestyle='--')
plt.axhline(y=1, color='b', linestyle='--')
figure_2_2 = plt.plot(alphas_simulated, betas_simulated, 'ro', alphas_simulated, betas_simulated, 'k-')


to obtain something like this:

that looks similar to the original figure from the book, but the 3-tuples $$(c,α,β)(c,\alpha,\beta)$$ from my simulation has different values of $$α\alpha$$ and $$β\beta$$ when compared with the ones of the book for the same cutoff $$cc$$. For example:

• from the book we have $$(c=0.4,α=0.10,β=0.38)(c=0.4, \alpha=0.10, \beta=0.38)$$
• from my simulation we have:
• $$(c=0.4,α=0.15,β=0.30)(c=0.4, \alpha=0.15, \beta=0.30)$$
• $$(c=0.8,α=0.10,β=0.39)(c=0.8, \alpha=0.10, \beta=0.39)$$

It seems that the cutoff $$c=0.8c=0.8$$ from my simulation is equivalent with the cutoff $$c=0.4c=0.4$$ from the book.

I would appreciate if someone could explain to me what I am doing wrong here. Thanks.

B) My calculation approach with complete python code and explanations

(26th December 2018)

Still trying to understand the difference between the results of my simulation (alpha_simulation(.), beta_simulation(.)) and those presented in the book, with the help of a statistician (Sofia) friend of mine, we calculated $$α\alpha$$ and $$β\beta$$ analytically instead of via simulation, so …

Once that

$$f0∼N(0,1) f_0 \sim \mathcal{N} \left(0,1\right)$$
$$f1∼N(0.5,1) f_1 \sim \mathcal{N} \left(0.5,1\right)$$

then

$$f(x|μ,σ2)=n∏i=11√2πσ2e−(xi−μ)22σ2 f\left(x \;\middle\vert\; \mu, \sigma^2 \right) = \prod_{i = 1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\left(x_i-\mu\right)^2}{2\sigma^2}}$$

Moreover,

$$L(x)=f1(x)f0(x) L(x) = \frac{f_1\left(x\right)}{f_0\left(x\right)}$$

so,

$$L(x)=f1(x|μ1,σ2)f0(x|μ0,σ2)=∏ni=11√2πσ2e−(xi−μ1)22σ2∏ni=11√2πσ2e−(xi−μ0)22σ2 L(x) = \frac{f_1\left(x\;\middle\vert\; \mu_1, \sigma^2\right)}{f_0\left(x\;\middle\vert\; \mu_0, \sigma^2\right)} = \frac{\prod_{i = 1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\left(x_i-\mu_1\right)^2}{2\sigma^2}}}{\prod_{i = 1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\left(x_i-\mu_0\right)^2}{2\sigma^2}}}$$

Therefore, by performing some algebraic simplifications (as below), we will have:

$$L(x)=(1√2πσ2)ne−∑ni=1(xi−μ1)22σ2(1√2πσ2)ne−∑ni=1(xi−μ0)22σ2 L(x) = \frac{\left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^n e^{-\frac{\sum_{i = 1}^{n} \left(x_i-\mu_1\right)^2}{2\sigma^2}}}{\left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^n e^{-\frac{\sum_{i = 1}^{n} \left(x_i-\mu_0\right)^2}{2\sigma^2}}}$$

$$=e−∑ni=1(xi−μ1)2+∑ni=1(xi−μ0)22σ2 = e^{\frac{-\sum_{i = 1}^{n} \left(x_i-\mu_1\right)^2 + \sum_{i = 1}^{n} \left(x_i-\mu_0\right)^2}{2\sigma^2}}$$

$$=e−∑ni=1(x2i−2xiμ1+μ21)+∑ni=1(x2i−2xiμ0+μ20)2σ2 = e^{\frac{-\sum_{i = 1}^{n} \left(x_i^2 -2x_i\mu_1 + \mu_1^2\right) + \sum_{i = 1}^{n} \left(x_i^2 -2x_i\mu_0 + \mu_0^2\right)}{2\sigma^2}}$$

$$=e−∑ni=1x2i+2μ1∑ni=1xi−∑ni=1μ21+∑ni=1x2i−2μ0∑ni=1xi+∑ni=1μ202σ2 = e^{\frac{-\sum_{i = 1}^{n}x_i^2 + 2\mu_1\sum_{i = 1}^{n}x_i - \sum_{i = 1}^{n}\mu_1^2 + \sum_{i = 1}^{n}x_i^2 - 2\mu_0\sum_{i = 1}^{n}x_i + \sum_{i = 1}^{n}\mu_0^2}{2\sigma^2}}$$

$$=e2(μ1−μ0)∑ni=1xi+n(μ20−μ21)2σ2 = e^{\frac{2\left(\mu_1-\mu_0\right)\sum_{i = 1}^{n}x_i + n\left(\mu_0^2-\mu_1^2\right)}{2\sigma^2}}$$.

So, if

$$tc(x)={1if log L(x)≥c0if log L(x)

then, for $$log L(x)≥c\text{log } L(x) \ge c$$ we will have:

$$log (e2(μ1−μ0)∑ni=1xi+n(μ20−μ21)2σ2)≥c \text{log } \left( e^{\frac{2\left(\mu_1-\mu_0\right)\sum_{i = 1}^{n}x_i + n\left(\mu_0^2-\mu_1^2\right)}{2\sigma^2}} \right) \ge c$$

$$2(μ1−μ0)∑ni=1xi+n(μ20−μ21)2σ2≥c \frac{2\left(\mu_1-\mu_0\right)\sum_{i = 1}^{n}x_i + n\left(\mu_0^2-\mu_1^2\right)}{2\sigma^2} \ge c$$

$$n∑i=1xi≥2cσ2−n(μ20−μ21)2(μ1−μ0) \sum_{i = 1}^{n}x_i \ge \frac{2c\sigma^2 - n\left(\mu_0^2-\mu_1^2\right)}{2\left(\mu_1-\mu_0\right)}$$

$$n∑i=1xi≥2cσ22(μ1−μ0)−n(μ20−μ21)2(μ1−μ0) \sum_{i = 1}^{n}x_i \ge \frac{2c\sigma^2}{2\left(\mu_1-\mu_0\right)} - \frac{n\left(\mu_0^2-\mu_1^2\right)}{2\left(\mu_1-\mu_0\right)}$$

$$n∑i=1xi≥cσ2(μ1−μ0)−n(μ20−μ21)2(μ1−μ0) \sum_{i = 1}^{n}x_i \ge \frac{c\sigma^2}{\left(\mu_1-\mu_0\right)} - \frac{n\left(\mu_0^2-\mu_1^2\right)}{2\left(\mu_1-\mu_0\right)}$$

$$n∑i=1xi≥cσ2(μ1−μ0)+n(μ21−μ20)2(μ1−μ0) \sum_{i = 1}^{n}x_i \ge \frac{c\sigma^2}{\left(\mu_1-\mu_0\right)} + \frac{n\left(\mu_1^2-\mu_0^2\right)}{2\left(\mu_1-\mu_0\right)}$$

$$n∑i=1xi≥cσ2(μ1−μ0)+n(μ1−μ0)(μ1+μ0)2(μ1−μ0) \sum_{i = 1}^{n}x_i \ge \frac{c\sigma^2}{\left(\mu_1-\mu_0\right)} + \frac{n\left(\mu_1-\mu_0\right)\left(\mu_1+\mu_0\right)}{2\left(\mu_1-\mu_0\right)}$$

$$n∑i=1xi≥cσ2(μ1−μ0)+n(μ1+μ0)2 \sum_{i = 1}^{n}x_i \ge \frac{c\sigma^2}{\left(\mu_1-\mu_0\right)} + \frac{n\left(\mu_1+\mu_0\right)}{2}$$

$$(1n)n∑i=1xi≥(1n)(cσ2(μ1−μ0)+n(μ1+μ0)2) \left(\frac{1}{n}\right) \sum_{i = 1}^{n}x_i \ge \left(\frac{1}{n}\right) \left( \frac{c\sigma^2}{\left(\mu_1-\mu_0\right)} + \frac{n\left(\mu_1+\mu_0\right)}{2}\right)$$

$$∑ni=1xin≥cσ2n(μ1−μ0)+(μ1+μ0)2 \frac{\sum_{i = 1}^{n}x_i}{n} \ge \frac{c\sigma^2}{n\left(\mu_1-\mu_0\right)} + \frac{\left(\mu_1+\mu_0\right)}{2}$$

$$ˉx≥cσ2n(μ1−μ0)+(μ1+μ0)2 \bar{x} \ge \frac{c\sigma^2}{n\left(\mu_1-\mu_0\right)} + \frac{\left(\mu_1+\mu_0\right)}{2}$$

$$ˉx≥k, where k=cσ2n(μ1−μ0)+(μ1+μ0)2 \bar{x} \ge k \text{, where } k = \frac{c\sigma^2}{n\left(\mu_1-\mu_0\right)} + \frac{\left(\mu_1+\mu_0\right)}{2}$$

resulting in

$$tc(x)={1if ˉx≥k0if ˉx

In order to calculate $$α\alpha$$ and $$β\beta$$, we know that:

$$α=Prf0{t(x)=1}, \alpha = \text{Pr}_{f_0} \{t(x)=1\},$$
$$β=Prf1{t(x)=0}. \beta = \text{Pr}_{f_1} \{t(x)=0\}.$$

so,

$$α=Prf0{ˉx≥k},β=Prf1{ˉx

For $$α\alpha$$

$$α=Prf0{ˉx≥k}=Prf0{ˉx−μ0≥k−μ0} \alpha = \text{Pr}_{f_0} \{\bar{x} \ge k\} = \text{Pr}_{f_0} \{\bar{x} - \mu_0 \ge k - \mu_0\}$$

$$α=Prf0{ˉx−μ0σ√n≥k−μ0σ√n} \alpha = \text{Pr}_{f_0} \left\{\frac{\bar{x} - \mu_0}{\frac{\sigma}{\sqrt{n}}} \ge \frac{k - \mu_0}{\frac{\sigma}{\sqrt{n}}}\right\}$$

$$α=Prf0{z-score≥k−μ0σ√n} where k=cσ2n(μ1−μ0)+(μ1+μ0)2 \alpha = \text{Pr}_{f_0} \left\{\text{z-score} \ge \frac{k - \mu_0}{\frac{\sigma}{\sqrt{n}}}\right\} \enspace \enspace \text{ where } k = \frac{c\sigma^2}{n\left(\mu_1-\mu_0\right)} + \frac{\left(\mu_1+\mu_0\right)}{2}$$

so, I implemented the python code below:

def alpha_calculation(cutoff, m_0, m_1, variance, sample_size):
c = cutoff
n = sample_size
sigma = np.sqrt(variance)

k = (c*variance)/(n*(m_1-m_0)) + (m_1+m_0)/2.0

z_alpha = (k-m_0)/(sigma/np.sqrt(n))

# Pr{z_score >= z_alpha}
return 1.0 - st.norm(loc=0, scale=1).cdf(z_alpha)


For $$β\beta$$

$$β=Prf1{ˉx

$$β=Prf1{ˉx−μ1σ√n

$$β=Prf1{z-score

resulting in the python code below:

def beta_calculation(cutoff, m_0, m_1, variance, sample_size):
c = cutoff
n = sample_size
sigma = np.sqrt(variance)

k = (c*variance)/(n*(m_1-m_0)) + (m_1+m_0)/2.0

z_beta = (k-m_1)/(sigma/np.sqrt(n))

# Pr{z_score < z_beta}
return st.norm(loc=0, scale=1).cdf(z_beta)


and the code …

alphas_calculated = []
betas_calculated = []
for cutoff in cutoffs:
alpha_ = alpha_calculation(cutoff, 0.0, 0.5, 1.0, sample_size)
beta_ = beta_calculation(cutoff, 0.0, 0.5, 1.0, sample_size)

alphas_calculated.append(alpha_)
betas_calculated.append(beta_)


and the code …

# Reproducing Figure 2.2 from calculation results.
plt.xlabel('$$\\alpha$$')
plt.ylabel('$$\\beta$$')
plt.xlim(-0.1, 1.05)
plt.ylim(-0.1, 1.05)
plt.axvline(x=0, color='b', linestyle='--')
plt.axvline(x=1, color='b', linestyle='--')
plt.axhline(y=0, color='b', linestyle='--')
plt.axhline(y=1, color='b', linestyle='--')
figure_2_2 = plt.plot(alphas_calculated, betas_calculated, 'ro', alphas_calculated, betas_calculated, 'k-')


to obtain a figure and values for $$α\alpha$$ and $$β\beta$$ very similar to my first simulation

And finally to compare the results between simulation and calculation side by side …

df = pd.DataFrame({
'cutoff': np.round(cutoffs, decimals=2),
'simulated alpha': np.round(alphas_simulated, decimals=2),
'simulated beta': np.round(betas_simulated, decimals=2),
'calculated alpha': np.round(alphas_calculated, decimals=2),
'calculate beta': np.round(betas_calculated, decimals=2)
})
df


resulting in

This shows that the results of the simulation are very similar (if not the same) to those of the analytical approach.

In short, I still need help figuring out what might be wrong in my calculations. Thanks. 🙂

Answer

In the website of the book Computer Age Statistical Inference, there is a discussion session where Trevor Hastie and Brad Efron often reply to several questions. So, I posted this question there (as below) and received from Trevor Hastie the confirmation that there is an error in the book that will be fixed (in other words, my simulations and calculations – as implemented in Python in this question – are correct).

When Trevor Hastie replied that “In fact c=.75 for that plot” means that at the figure below (original Figure 2.2 from the book) the cutoff $$cc$$ should be $$c=0.75c=0.75$$ instead of $$c=0.4c=0.4$$:

So, using my functions alpha_simulation(.), beta_simulation(.), alpha_calculation(.) and beta_calculation(.) (whose the full Python code is available in this question) I got $$α=0.10\alpha=0.10$$ and $$β=0.38\beta=0.38$$ for a cutoff $$c=0.75c=0.75$$ as a confirmation that my code is correct.

alpha_simulated_c075 = alpha_simulation(0.75, f0_density, f1_density, sample_size, replicates)
beta_simulated_c075 = beta_simulation(0.75, f0_density, f1_density, sample_size, replicates)

alpha_calculated_c075 = alpha_calculation(0.75, 0.0, 0.5, 1.0, sample_size)
beta_calculated_c075 = beta_calculation(0.75, 0.0, 0.5, 1.0, sample_size)

print("Simulated: c=0.75, alpha={0:.2f}, beta={1:.2f}".format(alpha_simulated_c075, beta_simulated_c075))
print("Calculated: c=0.75, alpha={0:.2f}, beta={1:.2f}".format(alpha_calculated_c075, beta_calculated_c075))


Finally, when Trevor Hastie replied that “… resulting in a threshold for x of .4” it means that $$k=0.4k=0.4$$ in the equation below (see section B from this question):

$$ˉx≥k, where k=cσ2n(μ1−μ0)+(μ1+μ0)2 \bar{x} \ge k \text{, where } k = \frac{c\sigma^2}{n\left(\mu_1-\mu_0\right)} + \frac{\left(\mu_1+\mu_0\right)}{2}$$

resulting in

$$tc(x)={1if ˉx≥k0if ˉx

So, in Python we can get $$k=0.4k=0.4$$ for a cutoff $$c=0.75c=0.75$$ as below:

n = 10
m_0 = 0.0
m_1 = 0.5
variance = 1.0
c = 0.75

k = (c*variance)/(n*(m_1-m_0)) + (m_1+m_0)/2.0
threshold_for_x = k

print("threshold for x (when cutoff c=0.75) = {0:.1f}".format(threshold_for_x))


Attribution
Source : Link , Question Author : Francisco Fonseca , Answer Author : Francisco Fonseca