For a classification problem (assume that the loss function is the negative binomial likelihood), the gradient boosting (GBM) algorithm computes the residuals (negative gradient) and then fit them by using a regression tree with mean square error (MSE) as the splitting criterion. How is that different from the XGBoost algorithm?
Does XGBoost utilize regression trees to fit the negative gradient?
Is the only difference between GBM and XGBoost the regularization terms or does XGBoost use another split criterion to determine the regions of the regression tree?
@jbowman has the right answer: XGBoost is a particular implementation of GBM.
GBM is an algorithm and you can find the details in Greedy Function Approximation: A Gradient Boosting Machine.
XGBoost is an implementation of the GBM, you can configure in the GBM for what base learner to be used. It can be a tree, or stump or other models, even linear model.
Here is an example of using a linear model as base learning in XGBoost.