CNN xavier weight initialization

In some tutorials I found it was stated that “Xavier” weight initialization (paper: Understanding the difficulty of training deep feedforward neural networks) is an efficient way to initialize the weights of neural networks.

For fully-connected layers there was a rule of thumb in those tutorials:

Var(W)=2nin+nout,simpler alternative:Var(W)=1nin

where Var(W) is the variance of the weights for a layer, initialized with a normal distribution and nin, nout is the amount of neurons in the parent and in the current layer.

Are there similar rules of thumb for convolutional layers?

I am struggling to figure out what would be best to initialize the weights of a convolutional layer. E.g. in a layer where the shape of the weights is (5, 5, 3, 8), so the kernel size is 5x5, filtering three input channels (RGB input) and creating 8 feature maps…would be 3 considered the amount of input neurons? Or rather 75 = 5*5*3, because the input are 5x5 patches for each color channel?

I would accept both, a specific answer clarifying the problem or a more “generic” answer explaining the general process of finding the right initialization of weights and preferably linking sources.


In this case the amount of neurons should be 5*5*3.

I found it especially useful for convolutional layers. Often a uniform distribution over the interval [c/(in+out),c/(in+out)] works as well.

It is implemented as an option in almost all neural network libraries. Here you can find the source code of Keras’s implementation of Xavier Glorot’s initialization.

Source : Link , Question Author : daniel451 , Answer Author : dontloo

Leave a Comment