We can write Bayes’ theorem as
where p(θ|x) is the posterior, f(X|θ) is the conditional distribution, and p(θ) is the prior.
where p(θ|x) is the posterior, L(θ|x) is the likelihood function, and p(θ) is the prior.
My question is
- Why is Bayesian analysis done using the likelihood function and not the conditional distribution?
- Can you say in words what the difference between the likelihood and conditional distribution is? I know the likelihood is not a probability distribution and L(θ|x)∝f(X|θ).
Suppose that you have X1,…,Xn random variables (whose values will be observed in your experiment) that are conditionally independent, given that Θ=θ, with conditional densities fXi∣Θ(⋅∣θ), for i=1,…,n. This is your (postulated) statistical (conditional) model, and the conditional densities express, for each possible value θ of the (random) parameter Θ, your uncertainty about the values of the Xi‘s, before you have access to any real data. With the help of the conditional densities you can, for example, compute conditional probabilities like
for each θ.
After you have access to an actual sample (x1,…,xn) of values (realizations) of the Xi‘s that have been observed in one run of your experiment, the situation changes: there is no longer uncertainty about the observables X1,…,Xn. Suppose that the random Θ assumes values in some parameter space Π. Now, you define, for those known (fixed) values (x1,…,xn) a function
Note that Lx1,…,xn, known as the “likelihood function” is a function of θ. In this “after you have data” situation, the likelihood Lx1,…,xn contains, for the particular conditional model that we are considering, all the information about the parameter Θ contained in this particular sample (x1,…,xn). In fact, it happens that Lx1,…,xn is a sufficient statistic for Θ.
Answering your question, to understand the differences between the concepts of conditional density and likelihood, keep in mind their mathematical definitions (which are clearly different: they are different mathematical objects, with different properties), and also remember that conditional density is a “pre-sample” object/concept, while the likelihood is an “after-sample” one. I hope that all this also help you to answer why Bayesian inference (using your way of putting it, which I don’t think is ideal) is done “using the likelihood function and not the conditional distribution”: the goal of Bayesian inference is to compute the posterior distribution, and to do so we condition on the observed (known) data.