I have moderate background in time series forecasting. I have looked at several forecasting books, and I don’t see the following questions addressed in any of them.

I have two questions:

How would I determine objectively (via statistical test) if a given time series has:

- Stochastic Seasonality or a Deterministic Seasonality
- Stochastic Trend or a Deterministic Trend
What would happen if i model my time series as a deterministic trend/seasonality when the series has a clearly stochastic component?

Any help addressing these questions would be greatly appreciated.

Example data for trend:

`7,657 5,451 10,883 9,554 9,519 10,047 10,663 10,864 11,447 12,710 15,169 16,205 14,507 15,400 16,800 19,000 20,198 18,573 19,375 21,032 23,250 25,219 28,549 29,759 28,262 28,506 33,885 34,776 35,347 34,628 33,043 30,214 31,013 31,496 34,115 33,433 34,198 35,863 37,789 34,561 36,434 34,371 33,307 33,295 36,514 36,593 38,311 42,773 45,000 46,000 42,000 47,000 47,500 48,000 48,500 47,000 48,900`

**Answer**

1) As regards your first question, some tests statistics have been developed and discussed in the literature to test the null of stationarity and the null of a unit root. Some of the many papers that were written on this issue are the following:

Related to the trend:

- Dickey, D. y Fuller, W. (1979a), Distribution of the estimators for

autoregressive time series with a unit root, Journal of the American Statistical

Association 74, 427-31. - Dickey, D. y Fuller, W. (1981), Likelihood ratio statistics for autoregressive time series with a unit root, Econometrica 49, 1057-1071.
- Kwiatkowski, D., Phillips, P., Schmidt, P. y Shin, Y. (1992), Testing the null hypothesis of stationarity against the alternative of a unit root: How sure

are we that economic time series have a unit root?, Journal of Econometrics

54, 159-178. - Phillips, P. y Perron, P. (1988), Testing for a unit root in time series regression, Biometrika 75, 335-46.
- Durlauf, S. y Phillips, P. (1988), Trends versus random walks in time series

analysis, Econometrica 56, 1333-54.

Related to the seasonal component:

- Hylleberg, S., Engle, R., Granger, C. y Yoo, B. (1990), Seasonal integration

and cointegration, Journal of Econometrics 44, 215-38. - Canova, F. y Hansen, B. E. (1995), Are seasonal patterns constant over time?

a test for seasonal stability, Journal of Business and Economic Statistics

13, 237-252. - Franses, P. (1990), Testing for seasonal unit roots in monthly data, Technical Report 9032, Econometric Institute.
- Ghysels, E., Lee, H. y Noh, J. (1994), Testing for unit roots in seasonal time series. some theoretical extensions and a monte carlo investigation, Journal of Econometrics 62, 415-442.

The textbook

Banerjee, A., Dolado, J., Galbraith, J. y Hendry, D. (1993), Co-Integration,Error Correction, and the econometric analysis of non-stationary data, Advanced Texts in Econometrics. Oxford University Press

is also a good reference.

2) Your second concern is justified by the literature. If there is a unit root test then the traditional t-statistic that you would apply on a linear trend does not follow the standard distribution. See for example,

Phillips, P. (1987), Time series regression with unit root, Econometrica

55(2), 277-301.

If a unit root exists and is ignored, then the probability of rejecting the null that the coefficient of a linear trend is zero is reduced. That is, we would end up modelling a deterministic linear trend too often for a given significance level. In the presence of a unit root we should instead transform the data by taking regular differences to the data.

3) For illustration, if you use R you can do the following analysis with your data.

```
x <- structure(c(7657, 5451, 10883, 9554, 9519, 10047, 10663, 10864,
11447, 12710, 15169, 16205, 14507, 15400, 16800, 19000, 20198,
18573, 19375, 21032, 23250, 25219, 28549, 29759, 28262, 28506,
33885, 34776, 35347, 34628, 33043, 30214, 31013, 31496, 34115,
33433, 34198, 35863, 37789, 34561, 36434, 34371, 33307, 33295,
36514, 36593, 38311, 42773, 45000, 46000, 42000, 47000, 47500,
48000, 48500, 47000, 48900), .Tsp = c(1, 57, 1), class = "ts")
```

First, you can apply the Dickey-Fuller test for the null of a unit root:

```
require(tseries)
adf.test(x, alternative = "explosive")
# Augmented Dickey-Fuller Test
# Dickey-Fuller = -2.0685, Lag order = 3, p-value = 0.453
# alternative hypothesis: explosive
```

and the KPSS test for the reverse null hypothesis, stationarity against the

alternative of stationarity around a linear trend:

```
kpss.test(x, null = "Trend", lshort = TRUE)
# KPSS Test for Trend Stationarity
# KPSS Trend = 0.2691, Truncation lag parameter = 1, p-value = 0.01
```

Results: ADF test, at the 5% significance level a unit root is not rejected;

KPSS test, the null of stationarity is rejected in favour of a model with a linear trend.

Aside note: using `lshort=FALSE`

the null of the KPSS test is not rejected

at the 5% level, however, it selects 5 lags; a further inspection not shown here suggested that choosing 1-3 lags is appropriate for the data and leads to reject the null hypothesis.

In principle, we should guide ourselves by the test for which we were able to the reject the null hypothesis (rather than by the test for which we did not reject (we accepted) the null). However, a regression of the original series on a linear trend turns out to be not reliable. On the one hand, the R-square is

high (over 90%) which is pointed in the literature as an indicator of spurious regression.

```
fit <- lm(x ~ 1 + poly(c(time(x))))
summary(fit)
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 28499.3 381.6 74.69 <2e-16 ***
#poly(c(time(x))) 91387.5 2880.9 31.72 <2e-16 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 2881 on 55 degrees of freedom
#Multiple R-squared: 0.9482, Adjusted R-squared: 0.9472
#F-statistic: 1006 on 1 and 55 DF, p-value: < 2.2e-16
```

On the other hand, the residuals are autocorrelated:

```
acf(residuals(fit)) # not displayed to save space
```

Moreover, the null of a unit root in the residuals cannot be rejected.

```
adf.test(residuals(fit))
# Augmented Dickey-Fuller Test
#Dickey-Fuller = -2.0685, Lag order = 3, p-value = 0.547
#alternative hypothesis: stationary
```

At this point, you can choose a model to be used to obtain forecasts. For example, forecasts based on a structural time series model and on an ARIMA model can be obtained as follows.

```
# StructTS
fit1 <- StructTS(x, type = "trend")
fit1
#Variances:
# level slope epsilon
#2982955 0 487180
#
# forecasts
p1 <- predict(fit1, 10, main = "Local trend model")
p1$pred
# [1] 49466.53 50150.56 50834.59 51518.62 52202.65 52886.68 53570.70 54254.73
# [9] 54938.76 55622.79
# ARIMA
require(forecast)
fit2 <- auto.arima(x, ic="bic", allowdrift = TRUE)
fit2
#ARIMA(0,1,0) with drift
#Coefficients:
# drift
# 736.4821
#s.e. 267.0055
#sigma^2 estimated as 3992341: log likelihood=-495.54
#AIC=995.09 AICc=995.31 BIC=999.14
#
# forecasts
p2 <- forecast(fit2, 10, main = "ARIMA model")
p2$mean
# [1] 49636.48 50372.96 51109.45 51845.93 52582.41 53318.89 54055.37 54791.86
# [9] 55528.34 56264.82
```

A plot of the forecasts:

```
par(mfrow = c(2, 1), mar = c(2.5,2.2,2,2))
plot((cbind(x, p1$pred)), plot.type = "single", type = "n",
ylim = range(c(x, p1$pred + 1.96 * p1$se)), main = "Local trend model")
grid()
lines(x)
lines(p1$pred, col = "blue")
lines(p1$pred + 1.96 * p1$se, col = "red", lty = 2)
lines(p1$pred - 1.96 * p1$se, col = "red", lty = 2)
legend("topleft", legend = c("forecasts", "95% confidence interval"),
lty = c(1,2), col = c("blue", "red"), bty = "n")
plot((cbind(x, p2$mean)), plot.type = "single", type = "n",
ylim = range(c(x, p2$upper)), main = "ARIMA (0,1,0) with drift")
grid()
lines(x)
lines(p2$mean, col = "blue")
lines(ts(p2$lower[,2], start = end(x)[1] + 1), col = "red", lty = 2)
lines(ts(p2$upper[,2], start = end(x)[1] + 1), col = "red", lty = 2)
```

The forecasts are similar in both cases and look reasonable. Notice that the forecasts follow a relatively deterministic pattern similar to a linear trend, but we did not modelled explicitly a linear trend. The reason is the following:

i) in the local trend model, the variance of the slope component is estimated as zero. This turns the trend component into a drift that has the effect of a linear trend.

ii) ARIMA(0,1,1), a model with a drift is selected in a model for the differenced series.The effect of the constant term on a differenced series is a linear trend. This is discussed in this post.

You may check that if a local model or an ARIMA(0,1,0) without drift are chosen, then the forecasts are a straight horizontal line and, hence, would have no resemblance with the observed dynamic of the data. Well, this is part of the puzzle of unit root tests and deterministic components.

**Edit 1 (inspection of residuals):**

The autocorrelation and partial ACF do not suggest a structure in the residuals.

```
resid1 <- residuals(fit1)
resid2 <- residuals(fit2)
par(mfrow = c(2, 2))
acf(resid1, lag.max = 20, main = "ACF residuals. Local trend model")
pacf(resid1, lag.max = 20, main = "PACF residuals. Local trend model")
acf(resid2, lag.max = 20, main = "ACF residuals. ARIMA(0,1,0) with drift")
pacf(resid2, lag.max = 20, main = "PACF residuals. ARIMA(0,1,0) with drift")
```

As IrishStat suggested, checking for the presence of outliers is also advisable. Two additive outliers are detected using the package `tsoutliers`

.

```
require(tsoutliers)
resol <- tsoutliers(x, types = c("AO", "LS", "TC"),
remove.method = "bottom-up",
args.tsmethod = list(ic="bic", allowdrift=TRUE))
resol
#ARIMA(0,1,0) with drift
#Coefficients:
# drift AO2 AO51
# 736.4821 -3819.000 -4500.000
#s.e. 220.6171 1167.396 1167.397
#sigma^2 estimated as 2725622: log likelihood=-485.05
#AIC=978.09 AICc=978.88 BIC=986.2
#Outliers:
# type ind time coefhat tstat
#1 AO 2 2 -3819 -3.271
#2 AO 51 51 -4500 -3.855
```

Looking at the ACF, we can say that, at the 5% significance level, the residuals are random in this model as well.

```
par(mfrow = c(2, 1))
acf(residuals(resol$fit), lag.max = 20, main = "ACF residuals. ARIMA with additive outliers")
pacf(residuals(resol$fit), lag.max = 20, main = "PACF residuals. ARIMA with additive outliers")
```

In this case, the presence of potential outliers does not appear to distort the performance of the models. This is supported by the Jarque-Bera test for normality; the null of normality in the residuals from the initial models (`fit1`

, `fit2`

) is not rejected at the 5% significance level.

```
jarque.bera.test(resid1)[[1]]
# X-squared = 0.3221, df = 2, p-value = 0.8513
jarque.bera.test(resid2)[[1]]
#X-squared = 0.426, df = 2, p-value = 0.8082
```

**Edit 2 (plot of residuals and their values)**

This is how the residuals look like:

And these are their values in a csv format:

```
0;6.9205
-0.9571;-2942.4821
2.6108;4695.5179
-0.5453;-2065.4821
-0.2026;-771.4821
0.1242;-208.4821
0.1909;-120.4821
-0.0179;-535.4821
0.1449;-153.4821
0.484;526.5179
1.0748;1722.5179
0.3818;299.5179
-1.061;-2434.4821
0.0996;156.5179
0.4805;663.5179
0.8969;1463.5179
0.4111;461.5179
-1.0595;-2361.4821
0.0098;65.5179
0.5605;920.5179
0.8835;1481.5179
0.7669;1232.5179
1.4024;2593.5179
0.3785;473.5179
-1.1032;-2233.4821
-0.3813;-492.4821
2.2745;4642.5179
0.2935;154.5179
-0.1138;-165.4821
-0.8035;-1455.4821
-1.2982;-2321.4821
-1.9463;-3565.4821
-0.1648;62.5179
-0.1022;-253.4821
0.9755;1882.5179
-0.5662;-1418.4821
-0.0176;28.5179
0.5;928.5179
0.6831;1189.5179
-1.8889;-3964.4821
0.3896;1136.5179
-1.3113;-2799.4821
-0.9934;-1800.4821
-0.4085;-748.4821
1.2902;2482.5179
-0.0996;-657.4821
0.5539;981.5179
2.0007;3725.5179
1.0227;1490.5179
0.27;263.5179
-2.336;-4736.4821
1.8994;4263.5179
0.1301;-236.4821
-0.0892;-236.4821
-0.1148;-236.4821
-1.1207;-2236.4821
0.4801;1163.5179
```

**Attribution***Source : Link , Question Author : forecaster , Answer Author : Community*