I’m reading the GPML book and in Chapter 2 (page 15), it tells how to do regression using Gaussian Process(GP), but I’m having a hard time figuring how it works.

In Bayesian inference for parametric models, we first choose a prior on the model parameters \theta, that is p(\theta); second, given the training data D, we compute the likelihood p(D|\theta); and finally we have the posterior of \theta as p(\theta|D), which will be used in the

predictive distributionp(y^*|x^*,D)=\int p(y^*|x^*,\theta)p(\theta|D)d\theta, and the above is what we do in Bayesian inference for parametric models, right?Well, as said in the book, GP is non-parametric, and so far as I understand it, after specifying the

mean functionm(x) and thecovariance functionk(x,x’), we have a GP over function f, f \sim GP(m,k), and this is thepriorof f. Now I have anoise-freetraining data set D=\{(x_1,f_1),…,(x_n,f_n)\}， I thought I should compute thelikelihoodp(D|f) and then theposteriorp(f|D), and finally use the posterior to make predictions.HOWEVER, that’s not what the book does! I mean, after specifying the prior p(f), it doesn’t compute the likelihood and posterior, but just go straight forward to the predictive prediction.

Question:

1) Why not compute the likelihood and posterior? Just because GP is non-parametric, so we don’t do that?

2) As what is done in the book (page 15~16), it derives the

predictive distributionvia the joint distribution of training data set \textbf f and test data set \textbf f^*, which is termed asjoint prior. Alright, this confuses me badly, why joint them together?3) I saw some articles call f the

latentvariable, why?

**Answer**

and the above is what we do in Bayesian inference for parametric

models, right?

The book is using Bayesian model averaging, which is the same for parametric models or any other Bayesian method, given that you have posterior over your parameters.

Now I have a noise-free training data set

It doesn’t need to be ‘noise-free’. See later pages.

HOWEVER, that’s not what the book does! I mean, after specifying the

prior p(f), it doesn’t compute the likelihood and posterior, but just

go straight forward to the predictive prediction.

See this: https://people.cs.umass.edu/~wallach/talks/gp_intro.pdf

I believe, in page 17 we have the prior, and later the likelihood. I believe if you write the derivations, and find the posterior, and then average over the posterior for prediction (like in the weight-space view) it will result in the same equations as in page 19 for mean and covariance.

**Attribution***Source : Link , Question Author : avocado , Answer Author : Daniel*