Limiting distribution of 1n∑nk=1|Sk−1|(X2k−1)\frac1n \sum_{k=1}^{n}|S_{k-1}|(X_k^2 – 1) where XkX_k are i.i.d standard normal

Let (Xn) be a sequence of i.i.d N(0,1) random variables. Define S0=0 and Sn=nk=1Xk for n1. Find the limiting distribution of 1nnk=1|Sk1|(X2k1)

This problem is from a problem book on Probability Theory, in the chapter on the Central Limit Theorem.

Since Sk1 and Xk are independent, E(|Sk1|(X2k1))=0 and V(|Sk1|(X2k1))=E(S2k1(X2k1)2)=E(S2k1)E(X2k1)2)=2(k1)

Note that the |Sk1|(X2k1) are clearly not independent. The problem is from Shiryaev’s Problems in Probability, which is itself based on the textbook from the same author. The textbook does not seem to cover the CLT for correlated variables. I don’t know if there’s a stationary, mixing sequence hiding somewhere…

I have run simulations to get a feel of the answer

import numpy as np
import scipy as sc
import scipy.stats as stats
import matplotlib.pyplot as plt

n = 20000 #summation index
m = 2000 #number of samples

X = np.random.normal(size=(m,n))
sums = np.cumsum(X, axis=1)
sums = np.delete(sums, -1, 1)
prods = np.delete(X**2-1, 0, 1)*np.abs(sums)
samples = 1/n*np.sum(prods, axis=1)

plt.hist(samples, bins=100, density=True)
x = np.linspace(-6, 6, 100)
plt.plot(x, stats.norm.pdf(x, 0, 1/np.sqrt(2*np.pi)))

Below is a histogram of 2000 samples (n=20.000). It looks fairly normally distributed…

enter image description here


When I simulate the distribution then I get something that resembles a Laplace distribution. Even better seems to be a q-Gausian (the exact parameters you would have to find using theory).

I guess that your book must contain some variation of the CLT that relates to that (q-generalised central limit theorem, probably it is in Section 7.6 The central limit theorem for sums of dependent variables, but I can’t look it up as I do not have the book available).


Qstore <- c(0) # vector to store result

n <- 10^6  # columns X_i
m <- 10^2  # rows repetitions

pb <- txtProgressBar(title = "progress bar", min = 0,
                     max = 100, style=3)
for (i in 1:100) {  
  # doing this several times because this matrix method takes a lot of memory
  # with smaller numbers n*m it can be done at once

  X <- matrix(rnorm(n*m,0,1),m)
  S <- t(sapply(1:m, FUN = function(x) cumsum(X[x,])))
  S <- cbind(rep(0,m),S[,-n])
  R <- abs(S)*(X^2-1)
  Q <- t(sapply(1:m, FUN = function(x) cumsum(R[x,])))

  Qstore <- c(Qstore,t(Q[,n]))
  setTxtProgressBar(pb, i)

# compute histogram 
x <- seq(floor(min(Qstore/n)), ceiling(max(Qstore/n)), 0.2)
h <- hist(Qstore/(n),breaks = x)

# plot simulation
plot( h$mid, h$density, log = "y", xlim=c(-7,7),
      ylab = "log density" , xlab = expression(over(1,n)*sum(abs(S[k-1])*(X[k]^2-1),k==1,n) ) )

# distributions for comparison
lines(x, dnorm(x,0,1),                   col=1, lty=3)      #normal 
lines(x, dexp(abs(x),sqrt(2))/2,         col=1, lty=2)      #laplace
lines(x, qGaussian::dqgauss(x,sqrt(2),0,1/sqrt(2)), col=1, lty=1)      #qgauss

# further plotting
title("10^4 repetitions with n=10^6")
legend(-7,0.6,c("Gaussian", "Laplace", "Q-Gaussian"),col=1, lty=c(3,2,1),cex=0.8)

Source : Link , Question Author : Gabriel Romon , Answer Author : Sextus Empiricus

Leave a Comment