It is wellknown that as you have more evidence (say in the form of larger n for n i.i.d. examples), the Bayesian prior gets “forgotten”, and most of the inference is impacted by the evidence (or the likelihood).
It is easy to see it for various specific case (such as Bernoulli with Beta prior or other type of examples) – but is there a way to see it in the general case with x_1,\ldots,x_n \sim p(x\mu) and some prior p(\mu)?
EDIT: I am guessing it cannot be shown in the general case for any prior (for example, a pointmass prior would keep the posterior a pointmass). But perhaps there are certain conditions under which a prior is forgotten.
Here is the kind of “path” I am thinking about showing something like that:
Assume the parameter space is \Theta, and let p(\theta) and q(\theta) be two priors which place nonzero probability mass on all of \Theta. So, the two posterior calculations for each prior amount to:
p(\theta  x_1,\ldots,x_n) = \frac{\prod_i p(x_i  \theta) p(\theta)}{\int_{\theta} \prod_i p(x_i  \theta) p(\theta) d\theta}
and
q(\theta  x_1,\ldots,x_n) = \frac{\prod_i p(x_i  \theta) q(\theta)}{\int_{\theta} \prod_i p(x_i  \theta) q(\theta) d\theta}If you divide p by q (the posteriors), then you get:
p(\theta  x_1,\ldots,x_n)/q(\theta  x_1,\ldots,x_n) = \frac{p(\theta)\int_{\theta} \prod_i p(x_i  \theta) q(\theta)d \theta}{q(\theta)\int_{\theta} \prod_i p(x_i  \theta) p(\theta)d \theta}
Now I would like to explore the above term as n goes to \infty. Ideally it would go to 1 for a certain \theta that “makes sense” or some other nice behavior, but I can’t figure out how to show anything there.
Answer
Just a rough, but hopefully intuitive answer.

Look at it from the logspace point of view:
\log P(\thetax_1, \ldots, x_n)
= \log P(\theta) \sum_{i=1}^n \log P(x_i\theta) – C_n
where C_n>0 is a constant that depends on the data, but not on the parameter, and where your likelihoods assume i.i.d. observations. Hence, just concentrate on the part that determines the shape of your posterior, namely
S_n = \log P(\theta) \sum_{i=1}^n \log P(x_i\theta)

Assume that there is a D>0 such that \log P(\theta) \leq D. This is reasonable for discrete distributions.

Since the terms are all positive, S_n “will” grow (I’m skipping the technicalities here). But the contribution of the prior is bounded by D. Hence, the fraction contributed by the prior, which is at most D/S_n, decreases monotonically with each additional observation.
Rigorous proofs of course have to face the technicalities (and they can be very difficult), but the setting above is IMHO the very basic part.
Attribution
Source : Link , Question Author : bayesianOrFrequentist , Answer Author : Pedro A. Ortega