I have a simple regression model (y = param1*x1 + param2*x2).
When I fit the model to my data, I find two good solutions:
Solution A, params=(2,7), is best on the training set with RMSE=2.5
BUT! Solution B params=(24,20) wins big on the validation set, when I do cross validation.
solution A is sorrounded by bad solutions. So when I use solution A, the model is more sensitive to data variations.
solution B is sorrounded by OK solutions, so it’s less sensitive to changes in the data.
Is this a brand new theory I’ve just invented, that solutions with good neighbours are less overfitting? :))
Are there generic optimisation methods that would help me favour solutions B, to solution A?
The only way to obtain an rmse that has two local minima is for the residuals of model and data to be nonlinear. Since one of these, the model, is linear (in 2D), the other, i.e., the y data, must be nonlinear either with respect to the underlying tendency of the data or the noise function of that data, or both.
Therefore, a better model, a nonlinear one, would be the starting point for investigating the data. Moreover, without knowing something more about the data, one cannot say what regression method should be used with any certainty. I can offer that Tikhonov regularization, or related ridge regression, would be a good way to address the OP question. However, what smoothing factor should be used would depend on what one is trying to obtain by modelling. The assumption here appears to be that the least rmse makes the best model as we do not have a regression goal (other than OLS which is THE “go to” default method most often used to when a physically defined regression target is not even conceptualized).
So, what is the purpose of performing this regression, please? Without defining that purpose, there is no regression goal or target and we are just finding a regression for cosmetic purposes.