I have two time-series:

- A proxy for the market risk premium (ERP; red line)
- The risk-free rate, proxied by a government bond (blue line)
I want to test if the risk-free rate can explain the ERP. Hereby, I basically followed the advice of Tsay (2010, 3rd edition, p. 96): Financial Time Series:

- Fit the linear regression model and check serial correlations of the residuals.
- If the residual series is unit-root nonstationarity, take the first difference of both the dependent and explanatory variables.
Doing the first step, I get the following results:

`Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.77019 0.25103 26.97 <2e-16 *** Risk_Free_Rate -0.65320 0.04123 -15.84 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

As expected from the figure, the relation is negative and significant. However, the residuals are serially correlated:

Therefore, I first difference both the dependent and explanatory variable. Here is what I get:

`Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.002077 0.016497 -0.126 0.9 Risk_Free_Rate -0.958267 0.053731 -17.834 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

And the ACF of the residuals looks like:

This result looks great: First, the residuals are now uncorrelated. Second, the relation seems to be more negative now.

Here are my questions (you probably wondered by now 😉 The first regression, I would have interpreted as (econometric problems aside) “if the riskfree rate rises by one percentage point, the ERP falls by 0.65 percentage points.” Actually, after pondering about this for a while, I would interpret the second regression just the same (now resulting in a 0.96 percentage points fall though). Is this interpretation correct? It just feels weird that I transform my variables, but don’t have to change my interpretation. If this, however, is correct, why do the results change? Is this just the result of econometric problems? If so, does anyone have an idea why my second regression seems to be even “better”? Normally, I always read that you can have spurious correlations that vanish after you do it correctly. Here, it seems the other way round.

**Answer**

Suppose that we have the model

yt=β0+β1xt+β2t+ϵt.

You say that these coefficients are easier to interpret. Let’s subtract yt−1 from the lefthand side and β0+β1xt−1+β2(t−1)+ϵt−1, which equals yt−1, from the righthand side. We have

Δyt=β1Δxt+β2+Δϵt.

The intercept in the difference equation is the time trend. And the coefficient on Δx has the same interpretation as β1 in the original model.

If the errors were non-stationary such that

ϵt=t−1∑s=0νs,

such that νs is white noise, the the differenced error is white noise.

If the errors have a stationary AR(p) distribution, say, then the differenced error term would have a more complicated distribution and, notably, would retain serial correlation. Or if the original ϵ are already white noise (An AR(1) with a correlation coefficient of 0 if you like), then differencing induces serial correlation between the errors.

For these reasons, it is important to only difference processes that are non-stationary due to unit roots and use detrending for so-called trend stationary ones.

(A unit root causes the variance of a series to change and it actually explode over time; the expected value of this series is constant, however. A trend stationary process has the opposite properties.)

**Attribution***Source : Link , Question Author : Christoph_J , Answer Author : Charlie*