Applying the “kernel trick” to linear methods?

The kernel trick is used in several machine learning models (e.g. SVM). It was first introduced in the “Theoretical foundations of the potential function method in pattern recognition learning” paper in 1964.

The wikipedia definition says that it is

a method for using a linear classifier
algorithm to solve a non-linear
problem by mapping the original
non-linear observations into a
higher-dimensional space, where the
linear classifier is subsequently
used; this makes a linear
classification in the new space
equivalent to non-linear
classification in the original space.

One example of a linear model that has been extended to non-linear problems is the kernel PCA. Can the kernel trick be applied to any linear model, or does it have certain restrictions?


The kernel trick can only be applied to linear models where the examples in the problem formulation appear as dot products (Support Vector Machines, PCA, etc).

Source : Link , Question Author : Shane , Answer Author : ebony1

Leave a Comment