Based on Gradient Boosting Tree vs Random Forest . GBDT and RF using different strategy to tackle bias and variance.
My question is that can I resample dataset (with replacement) to train multiple GBDT and combine their predictions as the final result?
It is equivalent to build random forest using GBDT as the base learner
The idea is that, GBDT can overfit dataset (similar to fully grow decision tree, low bias high variance). I hope that using bagging technique can also reduce this problem and wish to get better performance.
Yes, you can. Bagging as a technique does not rely on a single classification or regression tree being the base learner; you can do it with anything, although many base learners (e.g., linear regression) are of less value than others. The bootstrap aggregating article on Wikipedia contains an example of bagging LOESS smoothers on ozone data.
If you were to do so, however, you would almost certainly not want to use the same parameters as a fully-tuned single GBM. A large part of the point of tuning a GBM is to prevent overfitting; bagging reduces overfitting through a different mechanism, so if your tuned GBM doesn’t overfit much, bagging probably won’t help much either – and, since you’re likely to need hundreds of trees to bag effectively, your runtime will go up by a factor of several hundred as well. So now you have two problems – how to tune your GBM given that it’s embedded in a random forest (although it likely isn’t so important to get it right, given that it’s embedded in a random forest,) and the runtime issue.
Having written all that, it is true that bagging-type thinking can be profitably integrated with GBM, although in a different manner. H20, for example, provides the option to have each tree of the GBM tree sequence developed on a random sample of the training data. This sample is done without replacement, as sampling with replacement is thought to cause the resultant tree to overfit those parts of the sample that were repeated. This approach was explicitly motivated by Breiman’s “adaptive bagging” procedure, see Friedman’s 1999 Stochastic Gradient Boosting paper for details.