I am currently reading about Bayesian Methods in Computation Molecular Evolution by Yang. In section 5.2 it talks about priors, and specifically Non-informative/flat/vague/diffuse, conjugate, and hyper- priors.

This might be asking for an oversimplification but, could someone explain simply the difference between these types of priors and how that affects the outcome of an analysis/decisions I would make during the process of a Bayesian analysis?

(I’m not a statistician and I am just starting out on the road to learning Bayesian analyses so the more it is in layman terms the better)

**Answer**

Simply put, a flat/non-informative prior is used when one has **little/no knowledge about the data** and hence it has the least effect on outcomes of your analysis (i.e. posterior inference).

Conjugate distributions are those whose prior and posterior distributions are the same, and the prior is called the conjugate prior. It is favoured for its **algebraic conveniences**, especially when the likelihood has a distribution in the form of exponential family (Gaussian, Beta, etc.). This is hugely beneficial when carrying posterior simulations using Gibbs sampling.

And finally imagine that a prior distribution is set on a parameter in your model, however you want to add an another level of complexity/uncertainty. You would then impose a prior distribution on the parameters of the aforementioned prior, hence the name **hyper**-prior.

I think Gelman’s Bayesian Data Analysis is a great start for anyone who’s interested in learning Bayesian statistics:)

**Attribution***Source : Link , Question Author : rg255 , Answer Author : honeychip*