I have seen posts where the discussion was centered around the effect of big and small total number of neurons in a neural network, especially with respect to the potential of the network to overfit or underfit. The general idea I got is that few neurons underfit and too many will overfit, which makes sense.
Upon thinking about it a bit more, I think it makes sense to also talk about the effect of the number of neurons per layer. My intuition tells me that even if the number of neurons in a Deep Neural Network is “the right amount” (this being problem/model-specific), if the number of neurons in just 1 hidden layer is large and the number of neurons in the rest of the layers is small, then I would expect that the model would not perform well compared to a model with the same number of hidden layers and same total number of neurons.
So, the question is, in the analysis of overfitting and underfitting and performance of a deep neural network, what are the differences between a large/small total number of neurons and a large/small number of neurons per layer, respectively?
There are other important factors to consider for underfitting and overfitting discussion. They are regularization techniques. For example, L1, L2 regularization, pooling and data argumentation.
So, it is not only about number of neurons.
In recent work, people like to build a large network, at the same time, put a lot of regularizations on it. For example, it is OK to have a model that number of parameters is greater than number of data points.
For large number of neurons in one layer or large number of layers discussion:
Theoretically, MLP with one hidden layer, and the hidden layer with infinite number of neurons can approximate any functions.
In practice, especially in vision problems, people like to have more layers than large number of neurons in one layer (prefer deep but not wide).
Check this post for details.
EfficientNet paper is interesting to read for searching a better network structure.