Formula for autocorrelation in R vs. Excel

I am trying to figure out how R computes lag-k autocorrelation (apparently, it is the same formula used by Minitab and SAS), so that I can compare it to using Excel’s CORREL function applied to the series and its k-lagged version. R and Excel (using CORREL) give slightly different autocorrelation values.

I’d also be interested to find out whether one computation is more correct than the other.


The exact equation is given in: Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer-Verlag. I’ll give you an example:

### simulate some data with AR(1) where rho = .75
xi <- 1:50
yi <- arima.sim(model=list(ar=.75), n=50)

### get residuals
res <- resid(lm(yi ~ xi))

### acf for lags 1 and 2
cor(res[1:49], res[2:50])      ### not quite how this is calculated by R
cor(res[1:48], res[3:50])      ### not quite how this is calculated by R

### how R calculates these
acf(res, lag.max=2, plot=F)

### how this is calculated by R
### note: mean(res) = 0 for this example, so technically not needed here
c0 <- 1/50 * sum( (res[1:50] - mean(res)) * (res[1:50] - mean(res)) ) 
c1 <- 1/50 * sum( (res[1:49] - mean(res)) * (res[2:50] - mean(res)) ) 
c2 <- 1/50 * sum( (res[1:48] - mean(res)) * (res[3:50] - mean(res)) ) 

And so on (e.g., res[1:47] and res[4:50] for lag 3).

Source : Link , Question Author : Galit Shmueli , Answer Author : Wolfgang

Leave a Comment