# what happens when a model is having more parameters than training samples

In a simple neural network, say, for example, the number of parameters is kept small compared to number of samples available for training and this perhaps forces the model to learn the patterns in the data. Right?

My question is that what repercussions could we have in a scenario where the number of parameters in a model are more than the number of training instances available ?

Can such a model lead to over-fit? What effect can those extra parameters bring about in the model performance?

Kindly shed some light on this. I believe that it is only the data representation (number of hidden layers, number of neurons in each layer etc.) that governs the number of parameters in the model. Is my understanding correct ?

When talking about neural networks (nowadays especially deep neural networks), it is nearly always the case that the network has far more parameters than training samples.

Theoretically, a simple two-layer neural network with $2n+d$ parameters is capable of perfectly fitting any dataset of $n$ samples of dimension $d$ (Zhang et al., 2017). So to answer your question, having such a large model can lead to overfitting.

The awesome thing about deep neural networks is that they work very well despite these potential overfitting problems. Usually it is thanks to various regularization effects implicit to the training/optimization algorithm and the network architecture, and explicitly used regularization methods such as dropout, weight decay and data augmentation. My paper Regularization for Deep Learning: A Taxonomy describes some of these effects in depth.

The obvious benefit of having many parameters is that you can represent much more complicated functions than with fewer parameters. The relationships that neural networks model are often very complicated ones and using a small network (adapting the size of the network to the size of the training set, i.e. making your data look big just by using a small model) can lead to the problem when your network is too simple and unable to represent the desired mapping (high bias). On the other hand, if you have many parameters, the network is flexible enough to represent the desired mapping and you can always employ stronger regularization to prevent overfitting.

To answer the last part of your question: The number of parameters is fully defined by the number of layers in the network, number of units in every layer, and dimensionality of the input and the output.