I would like an advice on a analysis method I am using, to know if it it statistically sound.
I have measured two point processes T1=t11,t12,...,t1n and T2=t21,t22,...,t2m and I want to determine if the events in T1 are somehow correlated to the events in T2.
One of the methods that I have found in the literature is that of constructing a cross-correlation histogram: for each t1n we find the delay to all the events of T2 that fall in a given window of time (before and after t1n), and then we construct an histogram of all these delays.
If the two processes are not correlated I would expect a flat histogram, as the probability of having an event in T2 after (or before) an event in T1 is equal at all delays. On the other hand if there is a peak in the histogram, this suggests that the two point process are somehow influencing each other (or, at least, have some common input).
Now, this is nice and good, but how do I determine whether the histograms do have a peak (I have to say that for my particular set of data they’re clearly flat, but still it would be nice to have a statistical way of confirming that)?
So, here what I’ve done: I’ve repeated the process of generating the histogram for several (1000) times keeping T1 as it is and using a “shuffled” version of T2.
To shuffle T2 I calculate the intervals between all the events, shuffle them and sum them to reconstitute a new point process. In R I simply do this with:
times2.swp <- cumsum(sample(diff(times2)))
So, I end up with 1000 new histogram, that show me the density of events in T2∗ compared to T1.
For each bin of these histogram (they’re all binned in the same way) I calculate the density of 95% of the histogram. In other words I’m saying, for instance: at time delay 5 ms, in 95% of the shuffled point processes there is a probability x of finding an event in T2∗ after an event in T1.
I would then take this 95% value for all of the time delays and use it as some “confidence limit” (probably this is not the correct term) so that anything that goes over this limit in the original histogram can be considered a “true peak”.
Question 1: is this method statistically correct? If not how would you tackle this problem?
Question 2: another thing that I want to see is whether there is a “longer” type of correlation of my data. For instance there may be similar changes in the rate of events in the two point processes (note that they may have quite different rates), but I’m not sure how to do that. I thought of creating an “envelope” of each point process using some sort of smoothing kernel and then performing a cross-correlation analysis of the two envelopes. Could you suggest any other possible type of analysis?
Thank you and sorry for this very long question.
A standard method to analyze this problem in two or more dimensions is Ripley’s (cross) K function, but there’s no reason not to use it in one dimension, too. (A Google search does a good job of digging up references.) Essentially, it plots the CDF of all distances between points in the two realizations rather than a histogram approximation to the PDF of those distances. (A variant, the L function, plots the difference between K and the null distribution for two uniform uncorrelated processes.) This neatly sidesteps most of the issues you are confronting with the need to choose bins, to smooth, etc. Confidence bands for K are typically created through simulation. This is easy to do in R. Many spatial stats packages for R can be used directly or readily adapted to this 1D case. Roger Bivand’s overview page on CRAN lists these packages: refer to the section on “Point Pattern Analysis.”