I have data on a large Italian firm’s employees over ten years and I would like to see how the gender gap in male-female earnings has changed over time. For this purpose I run pooled OLS:

$$

y_{it} = X’_{it}\beta + \delta {\rm male}_i + \sum^{10}_{t=1}\gamma_t d_t + \varepsilon_{it}

$$

where $y$ is log earnings per year, $X_{it}$ includes covariates that differ by individual and time, $d_t$ are year dummies and ${\rm male}_i$ equals one if a worker is male and is zero otherwise.Now I have a concern that some of the covariates can be maybe correlated with unobserved fixed effects. But when I use the fixed effects (within) estimator or first differences I lose the gender dummy because this variable does not change over time. I don’t want to use the random effects estimator because I often hear people saying that it puts assumptions that are very unrealistic and are unlikely to hold.

Are there any ways for keep the gender dummy and control fixed effects at the same time? If there is a way, do I need to cluster or take care for other problems with the errors for hypothesis tests on the gender variable?

**Answer**

There are a few potential ways for you to keep the gender dummy in a fixed effects regression.

**Within Estimator**

Suppose you have a similar model compared to your pooled OLS model which is

$$y_{it} = \beta_1 + \sum^{10}_{t=2} \beta_t d_t + \gamma_1 (male_i) + \sum^{10}_{t=1} \gamma_t (d_t \cdot male_i) + X’_{it}\theta + c_i + \epsilon_{it}$$

where the variables are as before. Now note that $\beta_1$ and $\beta_1 + \gamma_1 (male_i)$ cannot be identified because the within estimator cannot distinguish them from the fixed effect $c_i$. Given that $\beta_1$ is the intercept for the base year $t=1$, $\gamma_1$ is the gender effect on earnings in this period. What we can identify in this case are $\gamma_2, …, \gamma_{10}$ because they are interacted with your time dummies and they measure the differences in the partial effects of your gender variable relative to the first time period. This means if you observe an increase in your $\gamma_2,…,\gamma_{10}$ over time this is an indication for a widening of the earnings gap between men and women.

**First-Difference Estimator**

If you want to know the overall effect of the difference between men and women over time, you can try the following model:

$$y_{it} = \beta_1 + \sum^{10}_{t=2} \beta_t d_t + \gamma (t\cdot male_i) + X’_{it}\theta + c_i + \epsilon_{it}$$

where the variable $t = 1, 2,…,10$ is interacted with the time-invariant gender dummy. Now if you take first differences $\beta_1$ and $c_i$ drop out and you get

$$y_{it} – y_{i(t-1)} = \sum^{10}_{t=3} \beta_t (d_t – d_{(t-1)}) + \gamma (t\cdot male_i – [(t-1)male_i]) + (X’_{it}-X’_{i(t-1)})\theta + \epsilon_{it}-\epsilon_{i(t-1)}$$

Then $\gamma(t\cdot male_i – [(t-1)male_i]) = \gamma[(t – (t-1))\cdot male_i] = \gamma (male_i)$ and you can identify the gender difference in earnings $\gamma$. So the final regression equation will be:

$$\Delta y_{it} = \sum_{t=3}^{10}\beta_t \Delta d_t + \gamma(male_i) + \Delta X’_{it}\theta + \Delta \epsilon_{it}$$

and you get your effect of interest. The nice thing is that this is easily implemented in any statistical software but you lose a time period.

**Hausman-Taylor Estimator**

This estimator distinguishes between regressors that you can assume to be uncorrelated with the fixed effect $c_i$ and those that are potentially correlated with it. It further distinguishes between time-varying and time-invariant variables. Let $1$ denote variables that are uncorrelated with $c_i$ and $2$ those who are and let’s say your gender variable is the only time-invariant variable. The Hausman-Taylor estimator then applies the random effects transformation:

$$\tilde{y}_{it} = \tilde{X}’_{1it} + \tilde{X}’_{2it} + \gamma (\widetilde{male}_{i2}) + \tilde{c}_i + \tilde{\epsilon}_{it}$$

where tilde notation means $\tilde{X}_{1it} = X_{1it} – \hat{\theta}_i \overline{X}_{1i}$ where $\hat{\theta}_i$ is used for the random effects transformation and $\overline{X}_{1i}$ is the time-average over each individual. This isn’t like the usual random effects estimator that you wanted to avoid because group $2$ variables are instrumented for in order to remove the correlation with $c_i$. For $\tilde{X}_{2it}$ the instrument is $X_{2it} – \overline{X}_{2i}$. The same is done for the time-invariant variables, so if you specify the gender variable to be potentially correlated with the fixed effect it gets instrumented with $\overline{X}_{1i}$, so you must have more time-varying than time-invariant variables.

All of this might sound a little complicated but there are canned packages for this estimator. For instance, in Stata the corresponding command is `xthtaylor`

. For further information on this method you could read Cameron and Trivedi (2009) “Microeconometrics Using Stata”. Otherwise you can just stick with the two previous methods which are a bit easier.

**Inference**

For your hypothesis tests there is not much that needs to be considered other than what you would need to do anyway in a fixed effects regression. You need to take care for the autocorrelation in the errors, for example by clustering on the individual ID variable. This allows for an arbitrary correlation structure among clusters (individuals) which deals with autocorrelation. For a reference see again Cameron and Trivedi (2009).

**Attribution***Source : Link , Question Author : user42263 , Answer Author : Andy*