In some tutorials I found it was stated that “Xavier” weight initialization (paper: Understanding the difficulty of training deep feedforward neural networks) is an efficient way to initialize the weights of neural networks.

For fully-connected layers there was a rule of thumb in those tutorials:

Var(W)=2nin+nout,simpler alternative:Var(W)=1nin

where Var(W) is the variance of the weights for a layer, initialized with a normal distribution and nin, nout is the amount of neurons in the parent and in the current layer.

Are there similar rules of thumb for convolutional layers?

I am struggling to figure out what would be best to initialize the weights of a convolutional layer. E.g. in a layer where the shape of the weights is

`(5, 5, 3, 8)`

, so the kernel size is`5x5`

, filtering three input channels (RGB input) and creating`8`

feature maps…would be`3`

considered the amount of input neurons? Or rather`75 = 5*5*3`

, because the input are`5x5`

patches for each color channel?I would accept both, a specific answer clarifying the problem or a more “generic” answer explaining the general process of finding the right initialization of weights and preferably linking sources.

**Answer**

In this case the amount of neurons should be `5*5*3`

.

I found it especially useful for convolutional layers. Often a uniform distribution over the interval [−c/(in+out),c/(in+out)] works as well.

It is implemented as an option in almost all neural network libraries. Here you can find the source code of Keras’s implementation of Xavier Glorot’s initialization.

**Attribution***Source : Link , Question Author : daniel451 , Answer Author : dontloo*