In the rpart() routine to create CART models, you specify the complexity parameter to which you want to prune your tree. I have seen two different recommendations for choosing the complexity parameter:
Choose the complexity parameter associated with the minimum possible cross-validated error. This method is recommended by Quick-R and HSAUR.
Choose the greatest complexity parameter whose estimated cross-validated error is still within a SE of the minimum possible cross-validated error. This is my interpretation of the package documentation, which says: “A good choice of cp for pruning is often the leftmost value for which the mean lies below the horizontal line” in reference to this plot.
The two choices of cp produce quite different trees in my dataset.
It seems that the first method will always produce a more complex, potentially overfitted, tree. Are there other advantages, disadvantages, recommendations in the literature, etc. I should take into account when deciding which method to use? I can provide more information about my particular modelling problem if that would be useful, but am trying to keep this question broad enough to be relevant to others.
In practice I have seen both approaches taken, and I think that generally your results would not be expected to differ much either way.
That being said, Hastie et al recommend the “one-standard error” rule in the Elements of Statistical Learning, and I tend to trust their judgment (Section 7.10, pg. 244 in my version). The relevant quote is:
Often a “one-standard error” rule is used with cross-validation, in which we choose the most parsimonious model whose error is no more than one standard error above the error of the best model.”
Your intuition for why one would follow the one-standard error rule is right – you would do that to avoid selecting a model that overfits the data.