Partitioning trees in R: party vs. rpart

It’s been a while since I looked at partitioning trees. Last time I did this sort of thing, I like party in R (created by Hothorn). The idea of conditional inference via sampling makes sense to me. But rpart also had appeal.

In the current application (I can’t give details, but it involves trying to determine who will go to jail among a large sample of arrestees) I cannot use advanced methods like random forests, bagging, boosting etc. – I need an easily explicable rule.

I would also like to have some manual control over which nodes split, as recommended in Zhang & Singer (2010) Recursive Partitioning and Applications. The freeware that comes with that book allows this, but is otherwise rather primitive in its user input.

Any recommendations or suggestions?

Answer

I agree with @Iterator that the methodology is easier to explain for rpart. However, if you are looking for easily explainable rules, party (without bagged trees) doesn’t lose anything in regard to explaining the prediction – you still have a single tree. If you are also interested in looking at drivers of the outcome variable (not just pure predictive power) I would still think that party is the way to go – explaining that a decision tree (like rpart) can be quite biased in how it selects which variables are important and how it creates splits. Party uses permutation tests and statistically determine which variables are most important and how the splits are made. So, instead of biased leaning towards categorical variables with many levels, like rpart for example, party uses statistical tests to find the best structure.

Attribution
Source : Link , Question Author : Peter Flom , Answer Author : B_Miner

Leave a Comment