The kernel trick is used in several machine learning models (e.g. SVM). It was first introduced in the “Theoretical foundations of the potential function method in pattern recognition learning” paper in 1964.

The wikipedia definition says that it is

a method for using a linear classifier

algorithm to solve a non-linear

problem by mapping the original

non-linear observations into a

higher-dimensional space, where the

linear classifier is subsequently

used; this makes a linear

classification in the new space

equivalent to non-linear

classification in the original space.One example of a linear model that has been extended to non-linear problems is the kernel PCA. Can the kernel trick be applied to any linear model, or does it have certain restrictions?

**Answer**

The kernel trick can only be applied to linear models where the examples in the problem formulation appear as dot products (Support Vector Machines, PCA, etc).

**Attribution***Source : Link , Question Author : Shane , Answer Author : ebony1*