I’ve obtained a logistic regression model (via
train
) for a binary response, and I’ve obtained the logistic confusion matrix viaconfusionMatrix
incaret
. It gives me the logistic model confusion matrix, though I’m not sure what threshold is being used to obtain it. How do I obtain the confusion matrix for specific threshold values usingconfusionMatrix
incaret
?
Answer
Most classification models in R produce both a class prediction and the probabilities for each class. For binary data, in almost every case, the class prediction is based on a 50% probability cutoff.
glm
is the same. With caret
, using predict(object, newdata)
gives you the predicted class and predict(object, new data, type = "prob")
will give you class-specific probabilities (when object
is generated by train
).
You can do things differently by defining your own model and applying whatever cutoff that you want. The caret
website also has an example that uses resampling to optimize the probability cutoff.
tl;dr
confusionMatrix
uses the predicted classes and thus a 50% probability cutoff
Max
Attribution
Source : Link , Question Author : Black Milk , Answer Author : Dan Villarreal