I’m estimating parameters for a complex, “implicit” nonlinear model f(\mathbf{x}, \boldsymbol{\theta}). It’s “implicit” in the sense that I don’t have an explicit formula for f: its value is the output of a complex fluid dynamics code (CFD). After NLS regression, I had a look at residuals, and they don’t look very normal at all. Also, I’m having a lot of issues with estimating their variance-covariance matrix: methods available in

`nlstools`

fail with an error.I’m suspecting the assumption of normally distributed parameter estimators is not valid: thus I would like to use some nonparametric method to estimate confidence intervals, p-values and confidence regions for the three parameters of my model. I thought of bootstrap, but other approaches are welcome, so long as they don’t rely on normality of parameter estimators. Would this work:

- given data set D=\{P_i=(\mathbf{x}_i,f_i)\}_{i=1}^N, generate datasets D_1,\dots,D_m by sampling with replacement from D
- For each D_i, use NLS (Nonlinear Least Squares) to estimate model parameters \boldsymbol{\theta}^*_i=(\theta^*_{1i},\theta^*_{2i},\theta^*_{3i})
- I now have empirical distributions for the NLS parameters estimator. The sample mean of this distribution would be the bootstrap estimate for my parameters; 2.5% and 97.5% quantiles would give me confidence intervals. I could also make scatterplots matrices of each parameter against each other, and get an idea of the correlation among them. This is the part I like the most, because I believe that one parameter is weakly correlated with the others, while the remaining are extremely strongly correlated among themselves.
Is this correct? Then how do I compute the p-values – what is the null for nonlinear regression models? For example, for parameter \theta_{3}, is it that \theta_{3}=0, and the other two are not? How would I compute the p-value for such an hypothesis from my bootstrap sample \boldsymbol{\theta}^*_1,\dots,\boldsymbol{\theta}^*_m? I don’t see the connection with the null…

Also, each NLS fit takes me quite some time (let’s say a few hours) because I need to run my fluid dynamics code p\times N times, where N is the size of D and p is about 40 in my case. The total CPU time for bootstrap is then 40\times N \times m the time of a single CFD run, which is a lot. I would need a faster way. What can I do? I thought of building a metamodel for my CFD code (for example, a Gaussian Process model) and use that for bootstrapping, instead than CFD. What do you think? Would that work?

EDITI don’t think the NLS regression problem is convex. NLS is being used to find the calibration parameters of a 1D CFD (Computational Fluid Dynamics) code which better agree with data. If that helps, a plot of residuals can be seen here. I can add other plots (QQ plot?) if needed.I have no theoretical guarantee that there is only a single parameter vector \boldsymbol{\theta} which minimizes the RSS. One may wonder why to use NLS then. The main reason is pragmatic: calibrating the code is slow. A tool which can quickly compute an estimate \boldsymbol{\theta}^* such that \text{RSS}(\boldsymbol{\theta}^*)<\text{RSS}(\boldsymbol{\theta}_0), together with a reliable measure of uncertainty in my estimates, would be better than nothing. NLS is fast, with respect to, say, Bayesian inference with MCMC. However, since I then have to use bootstrap to get the reliable uncertainty estimate, I admit the advantage is somewhat reduced. I still think that the computational effort is less, but if you believe I'm using the wrong approach and I should do something totally different, I'm open to suggestions.

EDIT 2the setting is exactly the same as here. I'd be glad to provide any other details you need.

