I made a convolutional neural network and I wanted to check that my gradients are being calculated correctly using numeric gradient checking.
The question is, how close is close enough?
My checking function just spits out the calculated derivative, the numerically approximated derivative, the difference between the two, and whether or not the two values have the same sign (one being positive and the other being negative is a big no-no) for each weight.
The main concern I have is that for all the fully connected layers and all the convolutional layers except the first one, the differences look similar – the first 9-13 characters of the two numbers will match. That sounds good enough, right? But for weights of the first convolutional layer, sometimes I get up to 12 decimal places to match but it can also be as low as just 3. Is that enough, or could there be a possible error?
One good thing to note is the sign of the two values is always matching which is good, so the network will always make moves in the right direction, even if the magnitude of the movement is a bit off. But that’s the question… is there a chance that it is off?
The closest I have seen to addressing this was in the Stanford UFLDL tutorial within the softmax regression section. Copying the key statement:
The norm of the difference between the numerical gradient and your analytical gradient should be small, on the order of 10−9.
In python the code would look something like this:
norm(gradients - numericalGradients)/norm(gradients + numericalGradients)
gradients are you results from the derivative and
numericalGradients are the approximated gradients.