Is it better to difference a series (assuming it needs it) before using an Arima OR better to use the d parameter within Arima?
I was surprised how different the fitted values are depending on which route is taken with the same model and data. Or am I doing something incorrectly?
install.packages("forecast") library(forecast) wineindT<window(wineind, start=c(1987,1), end=c(1994,8)) wineindT_diff <diff(wineindT) #coefficients and other measures are similar modA<Arima(wineindT,order=c(1,1,0)) summary(modA) modB<Arima(wineindT_diff,order=c(1,0,0)) summary(modB) #fitted values from modA A<forecast.Arima(modA,1)$fitted #fitted from modB, setting initial value to the first value in the original series B<diffinv(forecast.Arima(modB,1)$fitted,xi=wineindT[1]) plot(A, col="red") lines(B, col="blue")
ADD:
Please note I am differencing the series once and fitting arima (1,0,0) then I am fitting arima (1,1,0) to the original series. I am (I think) reversing the differencing on the fitted values for the arima(1,0,0) on the differenced file.
I am comparing the fitted values – not the predictions.
Here is the plot (red is arima(1,1,0) and blue is the arima (1,0,0) on the differenced series after changing back to the original scale) :
Response to Dr. Hyndman’s Answer:
1) Can you illustrate in R code what I would need to do in order to get the two fitted values (and presumably forecasts) to match (allowing for small difference due to your first point in your answer) between Arima (1,1,0) and Arima(1,0,0) on the manually differenced series? I assume this has to do with the mean not being included in modA, but I am not entirely sure how to proceed.
2) Regarding your #3. I know I am missing the obvious, but are not $\hat{X}_t = X_{t1} + \phi(X_{t1}X_{t2}) $ and $\hat{Y}_t = \phi (X_{t1}X_{t2})$ the same when $\hat{Y}_t$ is defined as $\hat{X}_t – X_{t1}$? Are you saying I am “undifferencing” incorrectly?
Answer
There are several issues here.

If you difference first, then
Arima()
will fit a model to the differenced data. If you letArima()
do the differencing as part of the estimation procedure, it will use a diffuse prior for the initialization. This is explained in the help file forarima()
. So the results will be different due to the different ways the initial observation is handled. I don’t think it makes much difference in terms of the quality of the estimation. However, it is much easier to letArima()
handle the differencing if you want forecasts or fitted values on the original (undifferenced) data. 
Apart from differences in estimation, your two models are not equivalent because
modB
includes a constant whilemodA
does not. By default,Arima()
includes a constant when $d=0$ and no constant when $d>0$. You can override these defaults with theinclude.mean
argument. 
Fitted values for the original data are not equivalent to the undifferenced fitted values on the differenced data. To see this, note that the fitted values on the original data are given by
$$\hat{X}_t = X_{t1} + \phi(X_{t1}X_{t2})$$
whereas the fitted values on the differenced data are given by
$$\hat{Y}_t = \phi (X_{t1}X_{t2})$$
where $\{X_t\}$ is the original time series and $\{Y_t\}$ is the differenced series. Thus $$\hat{X}_t – \hat{X}_{t1} \ne \hat{Y}_t.$$
Attribution
Source : Link , Question Author : B_Miner , Answer Author : Rob Hyndman