I need to find a situation in which logistic regression does not work well. Furthermore, I would like to know when a random forest might perform better than a logistic regression model.
Consider these data (copied from @Sycorax’s answer here: Can Random Forest be used for Feature Selection in Multiple Linear Regression?):
There are two aspects to the data in this figure. First, the relationship is non-linear. That isn’t actually a problem for logistic regression properly specified. In some cases, a logistic regression might fair better than a standard decision tree (cf., my answer here: How to use boxplots to find the point where values are more likely to come from different conditions?, although vis-a-vie a random forest is more ambiguous). The bigger problem is that there is complete separation at the decision boundary. There are ways of trying to deal with that (see @Scortchi’s answer here: How to deal with perfect separation in logistic regression?), but it adds complexity and requires considerable sophistication to address well. I think a random forest would handle this as a matter of course.