# Do Gaussian process (regression) have the universal approximation property?

Can any continuous function on [a, b], where a and b are real numbers, be approximated or arbitrarily close to the function (in some norm) by Gaussian Processes (Regression)?

The first interpretation is: let $X$ be a compact subset of $\mathbb{R}^d$ (compactness is fundamental for all of the following!!!), let $k(\mathbf{x},\mathbf{x})$ be a continuous covariance function (or kernel) defined on $X\times X$, and denote with $C(X)$ the normed space of continuous functions on $X$, equipped with the maximum norm $||\cdot||_{\infty}$. For any function $f\in C(X)$, can $f$ be approximated to a prespecified tolerance $\epsilon$ by a function in the RKHS (Reproducing Kernel Hilbert Space) associated to $k$? You may well wonder what an RKHS is, an what all this has to do with Gaussian Process Regression. An RKHS $K(X)$ is the closure of the vector space formed by all possible finite linear combinations of all possible functions $f_\mathbf{y}(\mathbf{x})=k(\mathbf{x},\mathbf{y})$ where $\mathbf{y}\in X$. This is very strictly related to Gaussian process regression, because given a Gaussian process prior $GP(0,k(\mathbf{x},\mathbf{x}))$ on the space $C(X)$, then the (closure of the) space of all possible posterior means which can be generated by Gaussian Process Regression is exactly the RKHS. As a matter of fact, all possible posterior means are of the form
i.e., they are finite linear combinations of functions $f_\mathbf{x_i}(\mathbf{x})=k(\mathbf{x},\mathbf{x_i})$. Thus, we’re effectively asking if, given a Gaussian Process prior $GP(0,k(\mathbf{x},\mathbf{x}))$ on $C(X)$, for any function $f\in C(X)$ there is always a function $f^*$ in the (closure of the) space of all functions which can be generated by GPR, which is as close as desired to $f$.
The answer, for some particular kernels (including the classic Squared Exponential kernel, but not including the polynomial kernel), is yes. It can be proved that for such kernels $K(X)$ is dense in $C(X)$, i.e., for any $f\in C(X)$ and for any tolerance $\epsilon$, there is an $f^*$ in $K(X)$ such that $||f-f^*||_{\infty}<\epsilon$. Note the assumptions: $X$ is compact, $f$ is continuous and $k$ is a continuous kernel having the so-called universal approximation property. See here for a full proof in a more general (thus complicated) context.
This result is much less powerful than it looks at first sight. Even if $f^*$ is in the (closure of the) space of the posterior means which can be generated by GPR, we haven't proved that it is the particular posterior mean returned by GPR, for a training set large enough, where of course the training set consists of noisy observations of $f$ at points $\mathbf{x}_1,\dots,\mathbf{x}_n$. We haven't even proved that the posterior mean returned by GPR converges at all, for $n \to \infty$! This is actually the second interpretation suggested by @Dougal. The answer to this question depends on the answer to the first question: if there isn't any function $f^*$ in the RKHS which is a "good approximation" to $f$, of course we cannot hope that the posterior mean returned by GPR converges to it. However, it's a different question. If you would like to have an answer to this question too, please ask a new question.