I would like to forecast the non-stationary time series, involving several crucial a-priori assumptions following from studying of instances of such series.
I’ve constructed time-averaged one-point probability distribution function approximated by normal distribution. ˆp(x)=1√2πσ2∞exp(−x22σ2∞)
From this point of view, I want the forecast zt(l) not to exceed this when l→∞. To put it in other words, variance of zt(l) must be bounded.
The average two-point probability distribution function ˆp(xi,i;xj,j) also has been constructed, which led to identification of autocorrelation function. ρ(j)≈Aj−α provided 0<α<0.5.
At first, Box-Jenkins identification process led me to ARIMA(0,1,3) model, however
I can't have bounded variance until d≠0 (which follows from equations for BJ weights ψj). At the same time, I can't use d=0 since initial autocorrelation decreases slowly (which is probably evidence of non-stationarity according BJ). This is the main obstacle to me.
Visually, simulation of ARIMA(0,1,3) does not coincide with behaviour of my samples. And correlations of first difference of the series are in the bad agreement with correlations following from the model.
The analysis of residuals shows significant correlations starting lag 3. This is why my initial statement about ARIMA(0,1,3) is incorrect.
Trying to fit different ARIMA(p,0,0) models, I see that there is significant residual correlations close to the lag p for every p. It may assume that I need ARIMA(∞,0,q) model (as limiting choice), for instance fractional ARIMA.
From  I've learned about Fractional ARIMA(p,d,q) models which are ARIMA(∞,0,q) in effect.
I've not found any GNU R packages with support of missing values for this. Missing values seems to be a kind of challenge.
The publications on fractional ARIMA are quite rare. Are such fractional models really used? Maybe there is a good replacement of ARIMA models for my needs? The forecasting is not my major, I have only pragmatic interest.
From different literature (for instance ), I learned that it is practically impossible to decide between fractional ARIMA and models with "level shift". However, I have not found the package for GNU R to fit 'level shift' models.
: Granger, Joyeux.: J. of time series anal. vol. 1 no. 1 1980, p.15
: Grassi, de Magistris.: "When long memory meets the Kalman filter: A comparative study", Computational Statistics and Data Analysis, 2012, in press.
Update: to render my own progress and to answer @IrishStat
My statement about two-point probability distribution is incorrect in general. Constructed in this way function will depend on full series length. So, there is a little to extract from this. At least, parameter named α will depend on full series length.
Lists 2 and 3 also have been updated.
My data is available as dat file here.
At the current moment, I doubt between FARIMA and level shifts, and I still can't find appropriate software to check this options. This is also my first experience with model identification, so any help will be appreciated.
I have never seen a model like Box-Jenkins identification process led me to ARIMA(0,1,3) model BUT i had never seen a black swan until I went to Australia. Please post your data as it may suggest the need for
- Intervention Detection leading to including level shifts, local time trends et al
- Time varying parameters
- Time varying error variance
If your data is confidential, simply scale it.
OK having received your data (some 80000 readings), I selected 805 observations starting at point 6287 and obtained.
. A significant change point was detected at period 137 suggesting time-varying parameters. The remaining 668 observations suggest a pdq ARIMA Model (3,0,0) with a level.step shift supporting your preliminary conclusions about lag 3. . The Actual/Fit/Forecast graph is The Residual Plot and the acf of the residuals is . Since the acf of the residuals shows strong structure at periods 5 and 10 , you might further investigate seasonal structure at lag 5. I hope this helps.