When using the persistent CD learning algorithm for Restricted Bolzmann Machines, we start our Gibbs sampling chain in the first iteration at a data point, but contrary to normal CD, in following iterations we don’t start over our chain. Instead we start where the Gibbs sampling chain in the previous iteration ended.
In the normal CD algorithm each iteration evaluates a mini batch of data points and computes the Gibbs sampling chains starting from those data points themselves.
In persistent CD, should we keep Gibbs sampling chains for each data point? Or should we keep also a mini batch of Gibbs sampling chains, which started at data points which aren’t currently evaluated in the current iteration?
It seems to me that keeping Gibbs sampling chains for each data point will be too cumbersome, but on the other hand it seems inadequate to compare the signals of the current sample with the signals after a long Gibbs chain which didn’t start at the current sample.
The original paper describing this can be found here
In section 4.4, they discuss the ways in which the algorithm can be implemented. The best implementation that they discovered initially was to not reset any Markov Chains, to do one full Gibbs update on each Markov Chain for each gradient estimate, and to use a number of Markov Chains equal to the number of training data points in a mini-batch.
Section 3 might give you some intuition about the key idea behind PCD.