Reproduce figure of “Computer Age Statistical Inference” from Efron and Hastie

The summarized version of my question

(26th December 2018)

I am trying to reproduce Figure 2.2 from Computer Age Statistical Inference by Efron and Hastie, but for some reason that I’m not able to understand, the numbers are not corresponding with the ones in the book.

Assume we are trying to decide between two possible probability density functions for the observed data x, a null hypothesis density f0(x) and an alternative density f1(x). A testing rule t(x) says which choice, 0 or 1, we will make having observed data x. Any such rule has two associated frequentist error probabilities: choosing f1 when actually f0 generated x, and vice versa,

α=Prf0{t(x)=1},
β=Prf1{t(x)=0}.

Let L(x) be the likelihood ratio,

L(x)=f1(x)f0(x)

So, the Neyman–Pearson lemma says that the testing rule of the form tc(x) is the optimum hypothesis-testing algorithm

tc(x)={1if log L(x)c0if log L(x)<c.

For f0N(0,1),f1N(0.5,1), and sample size n=10 what would be the values for α and β for a cutoff c=0.4?

  • From Figure 2.2 of Computer Age Statistical Inference by Efron and Hastie we have:
    • α=0.10 and β=0.38 for a cutoff c=0.4
  • I found α=0.15 and β=0.30 for a cutoff c=0.4 using two different approaches: A) simulation and B) analytically.

I would appreciate if someone could explain to me how to obtain α=0.10 and β=0.38 for a cutoff c=0.4. Thanks.

The summarized version of my question finishes here. From now you will find:

  • In section A) details and complete python code of my simulation approach.
  • In section B) details and complete python code of the analytically approach.

A) My simulation approach with complete python code and explanations

(20th December 2018)

From the book …

In the same spirit, the Neyman–Pearson lemma provides an optimum hypothesis-testing algorithm. This is perhaps the most elegant of frequentist constructions. In its simplest formulation, the NP lemma assumes we are trying to decide between two possible probability density functions for the observed data x, a null hypothesis density f0(x) and an alternative density f1(x). A testing rule t(x) says which choice, 0 or 1, we will make having observed data x. Any such rule has two associated frequentist error probabilities: choosing f1 when actually f0 generated x, and vice versa,

α=Prf0{t(x)=1},
β=Prf1{t(x)=0}.

Let L(x) be the likelihood ratio,

L(x)=f1(x)f0(x)

(Source: Efron, B., & Hastie, T. (2016). Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge: Cambridge University Press.)

So, I implemented the python code below …

import numpy as np

def likelihood_ratio(x, f1_density, f0_density):
    return np.prod(f1_density.pdf(x)) / np.prod(f0_density.pdf(x))

Again, from the book …

and define the testing rule tc(x) by

tc(x)={1if log L(x)c0if log L(x)<c.

(Source: Efron, B., & Hastie, T. (2016). Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge: Cambridge University Press.)

So, I implemented the python code below …

def Neyman_Pearson_testing_rule(x, cutoff, f0_density, f1_density):
    lr = likelihood_ratio(x, f1_density, f0_density)
    llr = np.log(lr)
    
    if llr >= cutoff:
        return 1
    else:
        return 0

Finally, from the book …

enter image description here

Where it is possible to conclude that a cutoff c=0.4 will imply α=0.10 and β=0.38.

So, I implemented the python code below …

def alpha_simulation(cutoff, f0_density, f1_density, sample_size, replicates):
    NP_test_results = []
    
    for _ in range(replicates):
        x = f0_density.rvs(size=sample_size)
        test = Neyman_Pearson_testing_rule(x, cutoff, f0_density, f1_density)
        NP_test_results.append(test)
    
    return np.sum(NP_test_results) / float(replicates)

def beta_simulation(cutoff, f0_density, f1_density, sample_size, replicates):
    NP_test_results = []
    
    for _ in range(replicates):
        x = f1_density.rvs(size=sample_size)
        test = Neyman_Pearson_testing_rule(x, cutoff, f0_density, f1_density)
        NP_test_results.append(test)
    
    return (replicates - np.sum(NP_test_results)) / float(replicates)

and the code …

from scipy import stats as st
    
f0_density = st.norm(loc=0, scale=1)
f1_density = st.norm(loc=0.5, scale=1)

sample_size = 10
replicates = 12000

cutoffs = []
alphas_simulated = []
betas_simulated = []
for cutoff in np.arange(3.2, -3.6, -0.4):
    alpha_ = alpha_simulation(cutoff, f0_density, f1_density, sample_size, replicates)
    beta_ = beta_simulation(cutoff, f0_density, f1_density, sample_size, replicates)
    
    cutoffs.append(cutoff)
    alphas_simulated.append(alpha_)
    betas_simulated.append(beta_)

and the code …

import matplotlib.pyplot as plt
%matplotlib inline

# Reproducing Figure 2.2 from simulation results.
plt.xlabel('$\\alpha$')
plt.ylabel('$\\beta$')
plt.xlim(-0.1, 1.05)
plt.ylim(-0.1, 1.05)
plt.axvline(x=0, color='b', linestyle='--')
plt.axvline(x=1, color='b', linestyle='--')
plt.axhline(y=0, color='b', linestyle='--')
plt.axhline(y=1, color='b', linestyle='--')
figure_2_2 = plt.plot(alphas_simulated, betas_simulated, 'ro', alphas_simulated, betas_simulated, 'k-')

to obtain something like this:

enter image description here

that looks similar to the original figure from the book, but the 3-tuples (c,α,β) from my simulation has different values of α and β when compared with the ones of the book for the same cutoff c. For example:

  • from the book we have (c=0.4,α=0.10,β=0.38)
  • from my simulation we have:
    • (c=0.4,α=0.15,β=0.30)
    • (c=0.8,α=0.10,β=0.39)

It seems that the cutoff c=0.8 from my simulation is equivalent with the cutoff c=0.4 from the book.

I would appreciate if someone could explain to me what I am doing wrong here. Thanks.

B) My calculation approach with complete python code and explanations

(26th December 2018)

Still trying to understand the difference between the results of my simulation (alpha_simulation(.), beta_simulation(.)) and those presented in the book, with the help of a statistician (Sofia) friend of mine, we calculated α and β analytically instead of via simulation, so …

Once that

f0N(0,1)
f1N(0.5,1)

then

f(x|μ,σ2)=ni=112πσ2e(xiμ)22σ2

Moreover,

L(x)=f1(x)f0(x)

so,

L(x)=f1(x|μ1,σ2)f0(x|μ0,σ2)=ni=112πσ2e(xiμ1)22σ2ni=112πσ2e(xiμ0)22σ2

Therefore, by performing some algebraic simplifications (as below), we will have:

L(x)=(12πσ2)neni=1(xiμ1)22σ2(12πσ2)neni=1(xiμ0)22σ2

=eni=1(xiμ1)2+ni=1(xiμ0)22σ2

=eni=1(x2i2xiμ1+μ21)+ni=1(x2i2xiμ0+μ20)2σ2

=eni=1x2i+2μ1ni=1xini=1μ21+ni=1x2i2μ0ni=1xi+ni=1μ202σ2

=e2(μ1μ0)ni=1xi+n(μ20μ21)2σ2.

So, if

tc(x)={1if log L(x)c0if log L(x)<c.

then, for log L(x)c we will have:

log (e2(μ1μ0)ni=1xi+n(μ20μ21)2σ2)c

2(μ1μ0)ni=1xi+n(μ20μ21)2σ2c

ni=1xi2cσ2n(μ20μ21)2(μ1μ0)

ni=1xi2cσ22(μ1μ0)n(μ20μ21)2(μ1μ0)

ni=1xicσ2(μ1μ0)n(μ20μ21)2(μ1μ0)

ni=1xicσ2(μ1μ0)+n(μ21μ20)2(μ1μ0)

ni=1xicσ2(μ1μ0)+n(μ1μ0)(μ1+μ0)2(μ1μ0)

ni=1xicσ2(μ1μ0)+n(μ1+μ0)2

(1n)ni=1xi(1n)(cσ2(μ1μ0)+n(μ1+μ0)2)

ni=1xincσ2n(μ1μ0)+(μ1+μ0)2

ˉxcσ2n(μ1μ0)+(μ1+μ0)2

ˉxk, where k=cσ2n(μ1μ0)+(μ1+μ0)2

resulting in

tc(x)={1if ˉxk0if ˉx<k., where k=cσ2n(μ1μ0)+(μ1+μ0)2

In order to calculate α and β, we know that:

α=Prf0{t(x)=1},
β=Prf1{t(x)=0}.

so,

α=Prf0{ˉxk},β=Prf1{ˉx<k}. where k=cσ2n(μ1μ0)+(μ1+μ0)2

For α

α=Prf0{ˉxk}=Prf0{ˉxμ0kμ0}

α=Prf0{ˉxμ0σnkμ0σn}

α=Prf0{z-scorekμ0σn} where k=cσ2n(μ1μ0)+(μ1+μ0)2

so, I implemented the python code below:

def alpha_calculation(cutoff, m_0, m_1, variance, sample_size):
    c = cutoff
    n = sample_size
    sigma = np.sqrt(variance)
    
    k = (c*variance)/(n*(m_1-m_0)) + (m_1+m_0)/2.0
    
    z_alpha = (k-m_0)/(sigma/np.sqrt(n))
    
    # Pr{z_score >= z_alpha}
    return 1.0 - st.norm(loc=0, scale=1).cdf(z_alpha)

For β

β=Prf1{ˉx<k}=Prf1{ˉxμ1<kμ1}

β=Prf1{ˉxμ1σn<kμ1σn}

β=Prf1{z-score<kμ1σn} where k=cσ2n(μ1μ0)+(μ1+μ0)2

resulting in the python code below:

def beta_calculation(cutoff, m_0, m_1, variance, sample_size):
    c = cutoff
    n = sample_size
    sigma = np.sqrt(variance)
    
    k = (c*variance)/(n*(m_1-m_0)) + (m_1+m_0)/2.0
    
    z_beta = (k-m_1)/(sigma/np.sqrt(n))
    
    # Pr{z_score < z_beta}
    return st.norm(loc=0, scale=1).cdf(z_beta)

and the code …

alphas_calculated = []
betas_calculated = []
for cutoff in cutoffs:
    alpha_ = alpha_calculation(cutoff, 0.0, 0.5, 1.0, sample_size)
    beta_ = beta_calculation(cutoff, 0.0, 0.5, 1.0, sample_size)
    
    alphas_calculated.append(alpha_)
    betas_calculated.append(beta_)

and the code …

# Reproducing Figure 2.2 from calculation results.
plt.xlabel('$\\alpha$')
plt.ylabel('$\\beta$')
plt.xlim(-0.1, 1.05)
plt.ylim(-0.1, 1.05)
plt.axvline(x=0, color='b', linestyle='--')
plt.axvline(x=1, color='b', linestyle='--')
plt.axhline(y=0, color='b', linestyle='--')
plt.axhline(y=1, color='b', linestyle='--')
figure_2_2 = plt.plot(alphas_calculated, betas_calculated, 'ro', alphas_calculated, betas_calculated, 'k-')

to obtain a figure and values for α and β very similar to my first simulation

enter image description here

And finally to compare the results between simulation and calculation side by side …

df = pd.DataFrame({
    'cutoff': np.round(cutoffs, decimals=2), 
    'simulated alpha': np.round(alphas_simulated, decimals=2),
    'simulated beta': np.round(betas_simulated, decimals=2),
    'calculated alpha': np.round(alphas_calculated, decimals=2),
    'calculate beta': np.round(betas_calculated, decimals=2)
})
df

resulting in

enter image description here

This shows that the results of the simulation are very similar (if not the same) to those of the analytical approach.

In short, I still need help figuring out what might be wrong in my calculations. Thanks. 🙂

Answer

In the website of the book Computer Age Statistical Inference, there is a discussion session where Trevor Hastie and Brad Efron often reply to several questions. So, I posted this question there (as below) and received from Trevor Hastie the confirmation that there is an error in the book that will be fixed (in other words, my simulations and calculations – as implemented in Python in this question – are correct).

enter image description here

When Trevor Hastie replied that “In fact c=.75 for that plot” means that at the figure below (original Figure 2.2 from the book) the cutoff c should be c=0.75 instead of c=0.4:

enter image description here

So, using my functions alpha_simulation(.), beta_simulation(.), alpha_calculation(.) and beta_calculation(.) (whose the full Python code is available in this question) I got α=0.10 and β=0.38 for a cutoff c=0.75 as a confirmation that my code is correct.

alpha_simulated_c075 = alpha_simulation(0.75, f0_density, f1_density, sample_size, replicates)
beta_simulated_c075 = beta_simulation(0.75, f0_density, f1_density, sample_size, replicates)

alpha_calculated_c075 = alpha_calculation(0.75, 0.0, 0.5, 1.0, sample_size)
beta_calculated_c075 = beta_calculation(0.75, 0.0, 0.5, 1.0, sample_size)

print("Simulated: c=0.75, alpha={0:.2f}, beta={1:.2f}".format(alpha_simulated_c075, beta_simulated_c075))
print("Calculated: c=0.75, alpha={0:.2f}, beta={1:.2f}".format(alpha_calculated_c075, beta_calculated_c075))

enter image description here

Finally, when Trevor Hastie replied that “… resulting in a threshold for x of .4” it means that k=0.4 in the equation below (see section B from this question):

ˉxk, where k=cσ2n(μ1μ0)+(μ1+μ0)2

resulting in

tc(x)={1if ˉxk0if ˉx<k., where k=cσ2n(μ1μ0)+(μ1+μ0)2

So, in Python we can get k=0.4 for a cutoff c=0.75 as below:

n = 10
m_0 = 0.0
m_1 = 0.5
variance = 1.0
c = 0.75

k = (c*variance)/(n*(m_1-m_0)) + (m_1+m_0)/2.0
threshold_for_x = k

print("threshold for x (when cutoff c=0.75) = {0:.1f}".format(threshold_for_x))

enter image description here

Attribution
Source : Link , Question Author : Francisco Fonseca , Answer Author : Francisco Fonseca

Leave a Comment