If I have a convolutional neural network (CNN), which has about 1,000,000 parameters, how many training data is needed (assume I am doing stochastic gradient descent)? Is there any rule of thumb?

Additional notes: When I performed stochastic gradient descent (e.g., 64 patches for 1 iteration), after ~10000 iterations, the accuracy of the classifier can reach to a rough steady value). Is this mean not many data is needed? Like 100k-1000k data.

**Answer**

In order to figure out whether or not more data will be helpful, you should compare the performance of your algorithm on the training data (i.e. the data used to train the neural network) to its performance on testing data (i.e. data the neural network did not “see” in training).

A good thing to check would be the error (or accuracy) on each set as a function of iteration number. There are two possibilities for the outcome of this:

1) The training error converges to a value significantly lower than than the testing error. If this is the case, the performance of your algorithm will almost certainly improve with more data.

2) The training error and the testing error converge to about the same value (with the training error still probably being slightly lower than the testing error). In this case additional data by itself will not help your algorithm. If you need better performance than you are getting at this point, you should try either adding more neurons to your hidden layers, or adding more hidden layers. If enough hidden units are added, you will find your testing error will become noticeably higher than the training error, and more data will help at that point.

For a more thorough and helpful introduction to how to make these decisions, I highly recommend Andrew Ng’s Coursera course, particularly the “Evaluating a learning algorithm” and “Bias vs. Variance” lessons.

**Attribution***Source : Link , Question Author : RockTheStar , Answer Author : Kevin Lyons*