Which one is better maximum likelihood or marginal likelihood and why?

While performing regression if we go by the definition from: What is the difference between a partial likelihood, profile likelihood and marginal likelihood?

that, Maximum Likelihood
Find β and θ that maximizes L(β, θ|data).

While, Marginal Likelihood
We integrate out θ from the likelihood equation by exploiting the fact that we can identify the probability distribution of θ conditional on β.

Which is the better methodology to maximize and why?


Each of these will give different results with a different interpretation. The first finds the pair β,θ which is most probable, while the second finds the β which is (marginally) most probable. Imagine that your distribution looks like this:

θ=10.0 0.2 
θ=20.1 0.2 
θ=30.3 0.2 

Then the maximum likelihood answer is β=1 (θ=3), while the maximum marginal likelihood answer is β=2 (since, marginalizing over θ, P(β=2)=0.6).

I’d say that in general, the marginal likelihood is often what you want – if you really don’t care about the values of the θ parameters, then you should just collapse over them. But probably in practice these methods will not yield very different results – if they do, then it may point to some underlying instability in your solution, e.g. multiple modes with different combinations of β,θ that all give similar predictions.

Source : Link , Question Author : Ankit Chiplunkar , Answer Author : Chris

Leave a Comment