When implementing dropout (or drop connect) – do you need to account for the case that every node in a layer is dropped?

Even though this is a very small chance, what is the correct approach to take in this scenario? Pick a new random set to drop out or set all the inputs to the next layer to be zero?

Does anyone know what the popular libraries (TensorFlow, Keras, etc.) do in this situation?

**Answer**

This is a concern which will very rarely every be realized. For a moderately sized neural network whose hidden layers each have 1000 units, if the dropout probability is set to p=0.5 (the high end of what’s typically used) then the probability of all 1000 units being zero is 0.51000=9.3×10−302 which is a mind-bogglingly tiny value. Even for a very small neural network with only 50 units in the hidden layer, the probability of all units being zero is .550=8.9×10−16, or less than 11 thousand trillion

So in short, this isn’t something you ever need to worry about in most real-world situations, and in the rare instances where it does happen, you could simply rerun the dropout step to obtain a new set of dropped weights.

## UPDATE:

Digging through the source code for TensorFlow, I found the implementation of dropout here. TensorFlow doesn’t even bother accounting for the special case where all of the units are zero. If this happens to occur, then the output from that layer will simply be zero. The units don’t “disappear” when dropped, they just take on the value zero, which from the perspective of the other layers in the network is perfectly fine. They can perform their subsequent operations on a vector of zeros just as well as on a vector of non-zero values.

**Attribution***Source : Link , Question Author : Dan , Answer Author : jon_simon*