I’m trying to understand the receptive fields of CNN better. To do that I would like to calculate the receptive field of each neuron in LeNet. For a normal MLP it’s rather easy (see http://deeplearning.net/tutorial/lenet.html#sparse-connectivity), but it’s more difficult to calculate the receptive field of a neuron in a layer following one or more convolutional layers and pooling layers.
What is the receptive field of a neuron in the 2. convolutional layer? How much bigger is it in the following subsampling/pooling layer? And what is the formula for calculating these?
If you think about a convolutional net as an instance of a standard MLP, you can figure out the receptive fields in exactly the same way as the example you linked.
Each of the “destination pixels” of that image corresponds to a neuron whose inputs are the blue square in the source image. Depending on your network architecture the convolutions may not exactly correspond to pixels like that, but it’s the same idea. The weights used as inputs for all of those convolutional neurons are tied, but that’s irrelevant to what you’re thinking about here.
Pooling neurons can be thought of in the same way, combining the receptive fields of each of their inputs.