I fit a model to a series with
arima()in R with ARMA(5,5) and regression on some covariates.
fit5 = arima(x, order=c(5,0,5), xreg=covaraites, include.mean=F)
I am now checking if the fitted model is adequate.
The residual series looks like:
The residual series pass the Ljung Box test with a p value 0.3859. The ACF and PACF of the residual series are as follows (seems it is uncorrelated, right?):
The qqplot for the residual series is as follows:
qqnorm(fit5$residuals, asp = 1) qqline(fit5$residuals, asp=1)
It looks okay within (-2, 2). I wonder if it is not okay to be Gaussian?
IIC, arma model requires its residual series to be a white noise but not necessarily Gaussian. But
arima()fits the model using MLE (assuming Gaussian residual series?). So if my residual series can’t be taken as Gaussian, how shall I revise my model and what function in R I can use to fit to my time series?
When you use MLE a distributional assumption is applied, in this case probably Gaussian. So, if your Gaussian assumption doe not hold, then your likelihood function is messed up, and MLE is not reliable.
The QQ plot does not look good, it shows fatter tails than a normal distribution would have. You can test normality assumption using a number of tests such as Jarque Bera test. I bet they’ll all reject the normality.
The first and the simplest thing to try is log-transform. The look of your QQ-plot reminds me of lognormal distribution. You could look at the histogram of residuals and lognormal fit, or simply take the log of the variable re-fit ARIMA, then look at the residuals, I bet they’ll look much more normal.
I know some people on this board will suggest to ignore the non-normality of residuals, act as if you never cared for it “because you have a lot of (1200) observations”. My stance on this is that if you use normality in any way, such as in your likelihood function, and normality assumption does not hold then you can’t use the model.
I suggest you use the models, which do not assume normality, in this case. For example, you could try to fit t-distributed errors. In MATLAB there’s an option to use t-distribution in
arima class, there could be an option in R too. In any case it would be easy to modify R code, all you need is a new likelihood function, which can be found in a number of places such as MATLAB help.
Another option is to represent ARIMA in state-space form, see e.g. Shumway and Stoffer chapter 6, then use non-Gaussian errors, which is explained in section 6.10, particularly Example 6.23.