CNN xavier weight initialization

In some tutorials I found it was stated that “Xavier” weight initialization (paper: Understanding the difficulty of training deep feedforward neural networks) is an efficient way to initialize the weights of neural networks.

For fully-connected layers there was a rule of thumb in those tutorials:

Var(W)=2nin+nout,simpler alternative:Var(W)=1nin

where Var(W) is the variance of the weights for a layer, initialized with a normal distribution and nin, nout is the amount of neurons in the parent and in the current layer.

Are there similar rules of thumb for convolutional layers?

I am struggling to figure out what would be best to initialize the weights of a convolutional layer. E.g. in a layer where the shape of the weights is (5, 5, 3, 8), so the kernel size is 5x5, filtering three input channels (RGB input) and creating 8 feature maps…would be 3 considered the amount of input neurons? Or rather 75 = 5*5*3, because the input are 5x5 patches for each color channel?

I would accept both, a specific answer clarifying the problem or a more “generic” answer explaining the general process of finding the right initialization of weights and preferably linking sources.

Answer

In this case the amount of neurons should be 5*5*3.

I found it especially useful for convolutional layers. Often a uniform distribution over the interval [c/(in+out),c/(in+out)] works as well.

It is implemented as an option in almost all neural network libraries. Here you can find the source code of Keras’s implementation of Xavier Glorot’s initialization.

Attribution
Source : Link , Question Author : daniel451 , Answer Author : dontloo

Leave a Comment