When applying dropout in artificial neural networks, one needs to compensate for the fact that at training time a portion of the neurons were deactivated. To do so, there exist two common strategies:
- scaling the activation at test time
- inverting the dropout during the training phase
The two strategies are summarized in the slides below, taken from Standford CS231n: Convolutional Neural Networks for Visual Recognition.
Which strategy is preferable, and why?
Scaling the activation at test time:
Inverting the dropout during the training phase:
Andrew made very good explanation in his Deep Learning course on this session Dropout Regularization:
- Inverted dropout is more common because it makes the testing much easier
- The purpose of the inverting is to assure that the Z value will not be impacted by the reduce of W.
a3 = a3 / keep_prob at the last step of implementation:
Z = W * a + b , the element size of a has been reduced by
keep_prob from D3(a percentage of elements have been dropped out by D3), thus the value of Z is also gonna be reduced, so to compensate this roughly we shall invert the change by dividing
keep_prob to make sure the value of Z will not be impacted.