# Limiting distribution of 1n∑nk=1|Sk−1|(X2k−1)\frac1n \sum_{k=1}^{n}|S_{k-1}|(X_k^2 – 1) where XkX_k are i.i.d standard normal

Let $$(Xn)(X_n)$$ be a sequence of i.i.d $$N(0,1)\mathcal N(0,1)$$ random variables. Define $$S0=0S_0=0$$ and $$Sn=∑nk=1XkS_n=\sum_{k=1}^n X_k$$ for $$n≥1n\geq 1$$. Find the limiting distribution of $$1nn∑k=1|Sk−1|(X2k−1)\frac1n \sum_{k=1}^{n}|S_{k-1}|(X_k^2 - 1)$$

This problem is from a problem book on Probability Theory, in the chapter on the Central Limit Theorem.

Since $$Sk−1S_{k-1}$$ and $$XkX_k$$ are independent, $$E(|Sk−1|(X2k−1))=0E(|S_{k-1}|(X_k^2 - 1))=0$$ and $$V(|Sk−1|(X2k−1))=E(S2k−1(X2k−1)2)=E(S2k−1)E(X2k−1)2)=2(k−1)V(|S_{k-1}|(X_k^2 - 1)) = E(S_{k-1}^2(X_k^2 - 1)^2)= E(S_{k-1}^2)E(X_k^2 - 1)^2) =2(k-1)$$

Note that the $$|Sk−1|(X2k−1)|S_{k-1}|(X_k^2 - 1)$$ are clearly not independent. The problem is from Shiryaev’s Problems in Probability, which is itself based on the textbook from the same author. The textbook does not seem to cover the CLT for correlated variables. I don’t know if there’s a stationary, mixing sequence hiding somewhere…

I have run simulations to get a feel of the answer

import numpy as np
import scipy as sc
import scipy.stats as stats
import matplotlib.pyplot as plt

n = 20000 #summation index
m = 2000 #number of samples

X = np.random.normal(size=(m,n))
sums = np.cumsum(X, axis=1)
sums = np.delete(sums, -1, 1)
prods = np.delete(X**2-1, 0, 1)*np.abs(sums)
samples = 1/n*np.sum(prods, axis=1)

plt.hist(samples, bins=100, density=True)
x = np.linspace(-6, 6, 100)
plt.plot(x, stats.norm.pdf(x, 0, 1/np.sqrt(2*np.pi)))
plt.show()


Below is a histogram of $$20002000$$ samples ($$n=20.000n=20.000$$). It looks fairly normally distributed…

When I simulate the distribution then I get something that resembles a Laplace distribution. Even better seems to be a q-Gausian (the exact parameters you would have to find using theory).

I guess that your book must contain some variation of the CLT that relates to that (q-generalised central limit theorem, probably it is in Section 7.6 The central limit theorem for sums of dependent variables, but I can’t look it up as I do not have the book available).

library(qGaussian)
set.seed(1)
Qstore <- c(0) # vector to store result

n <- 10^6  # columns X_i
m <- 10^2  # rows repetitions

pb <- txtProgressBar(title = "progress bar", min = 0,
max = 100, style=3)
for (i in 1:100) {
# doing this several times because this matrix method takes a lot of memory
# with smaller numbers n*m it can be done at once

X <- matrix(rnorm(n*m,0,1),m)
S <- t(sapply(1:m, FUN = function(x) cumsum(X[x,])))
S <- cbind(rep(0,m),S[,-n])
R <- abs(S)*(X^2-1)
Q <- t(sapply(1:m, FUN = function(x) cumsum(R[x,])))

Qstore <- c(Qstore,t(Q[,n]))
setTxtProgressBar(pb, i)
}
close(pb)

# compute histogram
x <- seq(floor(min(Qstore/n)), ceiling(max(Qstore/n)), 0.2)
h <- hist(Qstore/(n),breaks = x)

# plot simulation
plot( h$$mid, h$$density, log = "y", xlim=c(-7,7),
ylab = "log density" , xlab = expression(over(1,n)*sum(abs(S[k-1])*(X[k]^2-1),k==1,n) ) )

# distributions for comparison
lines(x, dnorm(x,0,1),                   col=1, lty=3)      #normal
lines(x, dexp(abs(x),sqrt(2))/2,         col=1, lty=2)      #laplace
lines(x, qGaussian::dqgauss(x,sqrt(2),0,1/sqrt(2)), col=1, lty=1)      #qgauss

# further plotting
title("10^4 repetitions with n=10^6")
legend(-7,0.6,c("Gaussian", "Laplace", "Q-Gaussian"),col=1, lty=c(3,2,1),cex=0.8)