There seem to be many machine learning algorithms that rely on kernel functions. SVMs and NNs to name but two. So what is the definition of a kernel function and what are the requirements for it to be valid?
For x,y on S, certain functions K(x,y) can be expressed as an inner product (in usually a different space). K is often referred to as a kernel or a kernel function. The word kernel is used in different ways throughout mathematics, but this is the most common usage in machine learning.
The kernel trick is a way of mapping observations from a general set S into an inner product space V (equipped with its natural norm), without ever having to compute the mapping explicitly, in the hope that the observations will gain meaningful linear structure in V. This is important in terms of efficiency (computing dot products in a very high dimensional space very quicky) and practicality (we can convert linear ML algorithms to non-linear ML algorithms).
For a function K to be considered a valid kernel it has to satisfy Mercer’s conditions. This in practical terms means that we need to ensure the kernel matrix (computing the kernel product of every datapoint you have) will always positive semi-definite. This will ensure that the training objective function is convex, a very important property.