Let’s start assuming that I have cross-sectional data on y, x1, x2 (see below for y, x1, x2).
I want to estimate the effect of variables x1 and x2 and their interaction (x3=x1∗x2) on variable y using the control function approach, and highly likely x1 and x2 are endogenous. I have two instruments, z1 and z2. I estimate the following two first stage equations and I save the predicted residuals in the following way:
ivreg2 x1 z1 z2 predict error1hat, residuals ivreg2 x2 z1 z2 predict error2hat, residuals
Once I save the predicted residuals, I estimate the second-stage equation in the following way:
ivreg2 y x1 x2 x3 error1hat error2hat
Even though the estimated coefficients of x1, x2 and x3 make sense, I know that the standard errors are not OK (see page 8 of http://eml.berkeley.edu/~train/petrintrain.pdf).
In page 8 of http://eml.berkeley.edu/~train/petrintrain.pdf, the authors suggest to use the bootstrap to obtain corrected standard errors for x1, x2 and x3.
My questions are:
- How should I set up the bootstrap?
- Is the bootstrap applied only to the second-stage equation, or is it
applied to both the first-stage and second-stage equation?
Now, let’s assume that I have panel data on y, x1, and x2. First, I use the within-group differencing to delete unobserved heterogeneity, then I estimate the parameters using the control function approach as if the data is cross-sectional data (see above). Do I need to make some additional adjustments in the case that I use panel data with respect to the case shown above?
Cameron and Trivedi – Microeconometrics using Stata discuss different bootstrap techniques and the show Stata code files, for example, for Heckman’s two-step estimator.
Regarding question 2. : The bootstrap is indeed applied to both the first-stage and second-stage equation. You can also bootstrap only the second stage but then you have to make further assumptions about the distribution of your predicted variables (parametric bootstrap). Said so, it is much simpler to do the two-stage bootstrap.
Regarding question 1. :
Here is also a small overview which is free and discusses some of the topics you can also find in the Cameron and Trivedi book.
I have to say, I think the topic is often confusing, in particular if you have several first-stages, I have also a question open here, yet without answers.
Update: Sorry, I forgot to comment on the case of panel data. I would use cluster robust standard error in each stage of the two-stage bootstrap in this case.
PS: Stata has a quite elaborated help file on bootstrapping, you should also check that.