I need to forecast the following 4 variables for the 29th unit of time. I have roughly 2 years worth of historical data, where 1 and 14 and 27 are all the same period (or time of year). In the end, I am doing a OaxacaBlinder style decomposition on W, wd, wc, and p.
time W wd wc p 1 4.920725 4.684342 4.065288 .5962985 2 4.956172 4.73998 4.092179 .6151785 3 4.85532 4.725982 4.002519 .6028712 4 4.754887 4.674568 3.988028 .5943888 5 4.862039 4.758899 4.045568 .5925704 6 5.039032 4.791101 4.071131 .590314 7 4.612594 4.656253 4.136271 .529247 8 4.722339 4.631588 3.994956 .5801989 9 4.679251 4.647347 3.954906 .5832723 10 4.736177 4.679152 3.974465 .5843731 11 4.738954 4.759482 4.037036 .5868722 12 4.571325 4.707446 4.110281 .556147 13 4.883891 4.750031 4.168203 .602057 14 4.652408 4.703114 4.042872 .6059471 15 4.677363 4.744875 4.232081 .5672519 16 4.695732 4.614248 3.998735 .5838578 17 4.633575 4.6025 3.943488 .5914644 18 4.61025 4.67733 4.066427 .548952 19 4.678374 4.741046 4.060458 .5416393 20 4.48309 4.609238 4.000201 .5372143 21 4.477549 4.583907 3.94821 .5515663 22 4.555191 4.627404 3.93675 .5542806 23 4.508585 4.595927 3.881685 .5572687 24 4.467037 4.619762 3.909551 .5645944 25 4.326283 4.544351 3.877583 .5738906 26 4.672741 4.599463 3.953772 .5769604 27 4.53551 4.506167 3.808779 .5831352 28 4.528004 4.622972 3.90481 .5968299
I believe that W can be approximated by p⋅wd+(1−p)⋅wc plus measurement error, but you can see that W always considerably exceeds that quantity because of waste, approximation error, or theft.
Here are my 2 questions.
My first thought was to try vector autoregression on these variables with 1 lag and an exogenous time and period variable, but that seems like a bad idea given how little data I have. Are there any timeseries methods that (1) perform better in the face of “micronumerosity” and (2) would be able to exploit the link between the variables?
On the other hand, the moduli of the eigenvalues for the VAR are all less than 1, so I don’t think I need to worry about nonstationarity (though the DickeyFuller test suggest otherwise). The predictions seem mostly in line with projections from a flexible univariate model with a time trend, except for W and p, which are lower. The coefficients on the lags seem mostly reasonable, though they are insignificant for the most part. The linear trend coefficient is significant, as are some of the period dummies. Still, are there any theoretical reasons to prefer this simpler approach over the VAR model?
Full disclosure: I asked a similar question on Statalist with no response.
Answer
I understand that this question has been sitting here for years, but still, the following ideas may be useful:

If there are links between variables (and the theoretical formula does not work so well), PCA can be used to look for (linear) dependencies in a systematic way. I will show that this works well for the given data in this question.

Given there is not much data (112 numbers in total), only a few model parameters can be estimated (e.g. fitting full seasonal effects is not an option), and trying a custom model may make sense.
Here is how I would make a forecast, following these principles:
Step 1. We can use PCA to reveal dependencies in the data. Using R, with the data stored in x
:
> library(jvcoords)
> m < PCA(x)
> m
PCA: mapping p = 4 coordinates to q = 4 coordinates
PC1 PC2 PC3 PC4
standard deviation 0.18609759 0.079351671 0.0305622047 0.0155353709
variance 0.03463231 0.006296688 0.0009340484 0.0002413477
cum. variance fraction 0.82253436 0.972083769 0.9942678731 1.0000000000
This shows that the first two principal components explain 97% of the variance, and using three components covers 99.4% of the variance. Thus, it will be enough to make a model for first two or three PCs. (The data approximately satisfy W=0.234wd−1.152wc−8.842p.)
Doing PCA involved finding a 4×4 orthogonal matrix. The space of such matrices is 6dimensional, so we have estimated 6 parameters. (Since we only really use PC1 below, this may be fewer “effective” parameters.)
Step 2. There is a clear trend in PC1:
> t < 1:28
> plot(m$y[,1], type = "b", ylab = "PC1")
> trend < lm(m$y[,1] ~ t)
> abline(trend)
I create a copy of the PC scores with this trend removed:
> y2 < m$y
> y2[,1] < y2[,1]  fitted(trend)
Plotting the scores of the other PCs reveal no clear trends, so I leave these unchanged.
Since the PC scores are centred, the trend goes through the centre of mass of the PC1 sample and fitting the trend only corresponds to estimating one parameter.
Step 3. A pair scatter plot shows no clear structure, so I model the
PCs as being independent:
> pairs(y2, asp = 1, oma = c(1.7, 1.7, 1.7, 1.7))
Step 4. There is a clear periodicity in PC1, with lag 13 (as suggested by the question). This can be seen in different ways. For example, the lag 13 autocorrelation shows up as being significantly different from 0 in a correlogram:
> acf(y2[,1])
(The periodicity is visually more striking when plotting the data together with a shifted copy.)
Since we want to keep the number of estimated parameters low, and since the correlogram shows lag 13 as the only lag with a significant contribution, I will model PC1 as y(1)t+13=α13y(1)t+σεt+13, where the εt are independent and standard normally distributed (i.e. this is an AR(13) process with most coefficients fixed to 0). An easy way to estimate α13 and σ is using the lm()
function:
> lag13 < lm(y2[14:28,1] ~ y2[1:15,1] + 0)
> lag13
Call:
lm(formula = y2[14:28, 1] ~ y2[1:15, 1] + 0)
Coefficients:
y2[1:15, 1]
0.6479
> a13 < coef(lag13)
> s13 < summary(lag13)$sigma
As a plausibility test, I plot the given data (black), together with a random trajectory of my model for PC1 (blue), ranging one year into the future:
t.f < 29:41
pc1 < m$y[,1]
pc1.f < (predict(trend, newdata = data.frame(t = t.f))
+ a13 * y2[16:28, 1]
+ rnorm(13, sd = s13))
plot(t, pc1, xlim = range(t, t.f), ylim = range(pc1, pc1.f),
type = "b", ylab = "PC1")
points(t.f, pc1.f, col = "blue", type = "b")
The blue, simulated piece of path looks like a reasonable continuation of the data. The correlograms for PC2 and PC3 show no significant correlations, so I model these components as white noise. PC4 does show correlations, but contributes so little to the total variance that it seem not worth modelling, and I also model this component as white noise.
Here we have fitted two more parameters. This brings us to a total of nine parameters in the model (including the PCA), which does not seem absurd when we started with data consisting of 112 numbers.
Forecast. We can get a numeric forecast by leaving out the noise (to get the mean) and reversing the PCA:
> pc1.f < predict(trend, newdata = data.frame(t = t.f)) + a13 * y2[16:28, 1]
> y.f < data.frame(PC1 = pc1.f, PC2 = 0, PC3 = 0, PC4 = 0)
> x.f < fromCoords(m, y.f)
> rownames(x.f) < t.f
> x.f
W wd wc p
29 4.456825 4.582231 3.919151 0.5616497
30 4.407551 4.563510 3.899012 0.5582053
31 4.427701 4.571166 3.907248 0.5596139
32 4.466062 4.585740 3.922927 0.5622955
33 4.327391 4.533055 3.866250 0.5526018
34 4.304330 4.524294 3.856824 0.5509898
35 4.342835 4.538923 3.872562 0.5536814
36 4.297404 4.521663 3.853993 0.5505056
37 4.281638 4.515673 3.847549 0.5494035
38 4.186515 4.479533 3.808671 0.5427540
39 4.377147 4.551959 3.886586 0.5560799
40 4.257569 4.506528 3.837712 0.5477210
41 4.289875 4.518802 3.850916 0.5499793
Uncertainty bands can be obtained either analytically or simply using Monte Carlo:
N < 1000 # number of Monte Carlo samples
W.f < matrix(NA, N, 13)
for (i in 1:N) {
y.f < data.frame(PC1 = (predict(trend, newdata =
data.frame(t = t.f))
+ a13 * y2[16:28, 1]
+ rnorm(13, sd = s13)),
PC2 = rnorm(13, sd = sd(y2[,2])),
PC3 = rnorm(13, sd = sd(y2[, 3])),
PC4 = rnorm(13, sd = sd(y2[, 4])))
x.f < fromCoords(m, y.f)
W.f[i,] < x.f[, 1]
}
bands < apply(W.f, 2,
function(x) quantile(x, c(0.025, 0.15, 0.5,
0.85, 0.975)))
plot(t, x$W, xlim = range(t, t.f), ylim = range(x$W, bands),
type = "b", ylab = "W")
for (b in 1:5) {
lines(c(28, t.f), c(x$W[28], bands[b,]), col = "grey")
}
The plot shows the actual data for W, together with 60% (inner three lines) and 95% (outer two lines) uncertainty bands for a forecast using the fitted model.
Attribution
Source : Link , Question Author : dimitriy , Answer Author : kjetil b halvorsen