I have a (feed-forward single layer) neural network with which I try to predict an environment-related variable from two financial variables (regression). I use the “train” function from the caret package.

I use the

`nnet()`

algorithm in the caret package. I have two continuous predictors, and 420 data points.For theoretical understanding, I try to purposely overfit my model; to my understanding, this should normally work with

everydataset, e.g. bei increasing the “size” (i.e. the number of hidden units). However, increasing the size of hidden units drastically does not lead to overfitting.Thus, is it wrong to assume that you can overfit every neural network by increasing “size”? Which other variable could lead to an overfitting instead?

`grid <- expand.grid(size = 20 ) control <- trainControl(method = "cv", number = 10, verboseIter = TRUE ) fit <- train(x=train_parametres, y=train_result, method = "mlp", metric = "Rsquared", learnFunc = "Std_Backpropagation", learnFuncParams = c(0.2, 0.0), maxit = 1000, trControl = control, tuneGrid = grid, preProcess = c("center", "scale"), linout = T, verbose = T, allowParallel = T )`

**Answer**

The reason to try to overfit a data set is in order to understand the model capacity needed in order to represent your dataset.

If our model capacity is too low, you won’t be able to represent your data set. When you increase the model capacity until you can fully represent your data set, you know you found the minimal capacity.

The overfitting is not the goal here, it is a by product. Your model probably represent the **data set** and not necessarily the **concept**. If you will try this model on a test set, the performance will be probably be lower indicating the overfit.

However, model capacity is not the only reason that a model cannot represent a concept. It is possible that the concept doesn’t belong to the family of functions represented by your model – as when your NN is linear and the concept is not. It is possible that the input is not enough to differ between the samples or that your optimization algorithm simply failed to find the proper solution.

In your case, you have only two predictors. If they were binary it was quite likely you couldn’t represent two much with them.

Assuming that they are bounded and smooth, you can try to bin them.

If you get high entropy in bins (e.g., a bin with 50%-50% distribution), no logic relaying only on these features will be able to differ them.

**Attribution***Source : Link , Question Author : Requin , Answer Author : DaL*