When applying dropout in artificial neural networks, one needs to compensate for the fact that at training time a portion of the neurons were deactivated. To do so, there exist two common strategies:
- scaling the activation at test time
- inverting the dropout during the training phase
The two strategies are summarized in the slides below, taken from Standford CS231n: Convolutional Neural Networks for Visual Recognition.
Which strategy is preferable, and why?
Scaling the activation at test time:
Inverting the dropout during the training phase:
Answer
Andrew made very good explanation in his Deep Learning course on this session Dropout Regularization:
- Inverted dropout is more common because it makes the testing much easier
- The purpose of the inverting is to assure that the Z value will not be impacted by the reduce of W.
Say a3 = a3 / keep_prob
at the last step of implementation:
Z[4] = W[4] * a[3] + b[4] , the element size of a[3] has been reduced by keep_prob
from D3(a percentage of elements have been dropped out by D3), thus the value of Z[4] is also gonna be reduced, so to compensate this roughly we shall invert the change by dividing keep_prob
to make sure the value of Z[4] will not be impacted.
Attribution
Source : Link , Question Author : Franck Dernoncourt , Answer Author : xmindata