# CNN xavier weight initialization

In some tutorials I found it was stated that “Xavier” weight initialization (paper: Understanding the difficulty of training deep feedforward neural networks) is an efficient way to initialize the weights of neural networks.

For fully-connected layers there was a rule of thumb in those tutorials:

where $Var(W)$ is the variance of the weights for a layer, initialized with a normal distribution and $n_{in}$, $n_{out}$ is the amount of neurons in the parent and in the current layer.

Are there similar rules of thumb for convolutional layers?

I am struggling to figure out what would be best to initialize the weights of a convolutional layer. E.g. in a layer where the shape of the weights is (5, 5, 3, 8), so the kernel size is 5x5, filtering three input channels (RGB input) and creating 8 feature maps…would be 3 considered the amount of input neurons? Or rather 75 = 5*5*3, because the input are 5x5 patches for each color channel?

I would accept both, a specific answer clarifying the problem or a more “generic” answer explaining the general process of finding the right initialization of weights and preferably linking sources.

In this case the amount of neurons should be 5*5*3.
I found it especially useful for convolutional layers. Often a uniform distribution over the interval $$[−c/(in+out),c/(in+out)][-c/(in+out), c/(in+out)]$$ works as well.