I want to know if the process described below is valid/acceptable and any justification available.

The idea: Supervised learning algorithms don’t assume underlying structures/distributions about the data. At the end of the day they output point estimates. I hope to quantify the uncertainty of the estimates somehow. Now, the ML model building process is inherently random (e.g. in sampling for cross-validation for hyperparameter tuning and in subsampling in stochastic GBM), so a modeling pipeline is going to give me a different output for the same predictors with each different seed. My (naive) idea is to run this process over and over again to come up with a distribution of the prediction, and I can hopefully make statements about the uncertainty of the predictions.

If it matters, the datasets I work with are typically very small (~200 rows.)

Does this make sense?

To clarify, I’m not actually bootstrapping the data in the traditional sense (i.e. I’m not re-sampling the data). The same dataset is used in every iteration, I’m just exploiting the randomness in xval and stochastic GBM.

**Answer**

To me it seems as good approach as any to quantify the uncertainties in the predictions. Just make sure to repeat all modeling steps (for a GBM that would be the parameter tuning) from scratch in every bootstrap resample. It could also be worthwile to bootstrap the importance rankings to quantify the uncertainty in the rankings.

I have found that sometimes the intervals do not contain the actual prediction, especially when estimating a probability. Increasing the minimal number of observations in each terminal node usually solves that, at least in the data that I have worked with.

Conformal prediction seems like a useful approach for quantifying the confidence in predictions on new data. I have only scratched the surface thus far and others are probably more suited to give an optinion on that.

There is some crude R-code in my reply to this post about finding a GBM prediction Interval.

Hope this helps!

**Attribution***Source : Link , Question Author : kevinykuo , Answer Author : Community*