Classification algorithms that return confidence?

Given a machine learning model built on top of scikit-learn, how can I classify new instances but then choose only those with the highest confidence? How do we define confidence in machine learning and how to generate it (if not generated automatically by scikit-learn)? What should I change in this approach if I had more that 2 potential classes?

This is what I have done so far:

# load libraries
from sklearn import neighbors
# initialize NearestNeighbor classifier
knn = neighbors.KNeighborsClassifier(n_neighbors=3)
# train model
knn.fit([[1],[2],[3],[4],[5],[6]], [0,0,0,1,1,1])
# predict ::: get class probabilities
print(knn.predict_proba(1.5))
print(knn.predict_proba(37))
print(knn.predict_proba(3.5))

Answer

Three questions:

  1. “How do we define confidence in machine learning and how to generate it (if not generated automatically by scikit-learn)?”
    Here’s a great summary on different types of confidence measures in machine learning. The specific metric used depends on which algorithm/model you generate.

  2. “how can I classify new instances but then choose only those with the highest confidence?
    and,

  3. “What should I change in this approach if I had more that 2 potential classes?”

Here’s a quick script that you can play around with that expands on what you started with so you can see how to handle an arbitrary number of classes and finding the likely classes for each predicted example. I like numpy and pandas (which you’re likely using if you’re using sklearn).

from sklearn import neighbors
import pandas as pd
import numpy as np
number_of_classes = 3  # number of possible classes
number_of_features = 2 # number of features for each example
train_size = 20        # number of training examples
predict_size = 5       # number of examples to predict
# Generate a random 2-variable training set with random classes assigned
X = np.random.randint(100, size=(train_size, 2))
y = np.random.randint(number_of_classes, size=train_size)
# initialize NearestNeighbor classifier
knn = neighbors.KNeighborsClassifier(n_neighbors=3)
# train model
knn.fit(X, y)
# values to predict classes for
predict = np.random.randint(100, size=(predict_size, 2))
print "generated examples to predict:\n",predict,"\n"
# predict class probabilities for each class for each value and convert to DataFrame
probs = pd.DataFrame(knn.predict_proba(predict))
print "all probabilities:\n", probs, "\n"
for c in range(number_of_classes):
    likely=probs[probs[c] > 0.5]
    print "class" + str(c) + " probability > 0.5:\n", likely
    print "indexes of likely class" + str(c) + ":", likely.index.tolist(), "\n"

Attribution
Source : Link , Question Author : user2295350 , Answer Author : Dave Novelli

Leave a Comment