Dropout: scaling the activation versus inverting the dropout

When applying dropout in artificial neural networks, one needs to compensate for the fact that at training time a portion of the neurons were deactivated. To do so, there exist two common strategies:

  • scaling the activation at test time
  • inverting the dropout during the training phase

The two strategies are summarized in the slides below, taken from Standford CS231n: Convolutional Neural Networks for Visual Recognition.

Which strategy is preferable, and why?


Scaling the activation at test time:

enter image description here

Inverting the dropout during the training phase:

enter image description here

Answer

Andrew made very good explanation in his Deep Learning course on this session Dropout Regularization:

  • Inverted dropout is more common because it makes the testing much easier
  • The purpose of the inverting is to assure that the Z value will not be impacted by the reduce of W.

Say a3 = a3 / keep_prob at the last step of implementation:

Z[4] = W[4] * a[3] + b[4] , the element size of a[3] has been reduced by keep_prob from D3(a percentage of elements have been dropped out by D3), thus the value of Z[4] is also gonna be reduced, so to compensate this roughly we shall invert the change by dividing keep_prob to make sure the value of Z[4] will not be impacted.

enter image description here

Attribution
Source : Link , Question Author : Franck Dernoncourt , Answer Author : xmindata

Leave a Comment