While performing regression if we go by the definition from: What is the difference between a partial likelihood, profile likelihood and marginal likelihood?

that,

Maximum Likelihood

Find β and θ that maximizes L(β, θ|data).While,

Marginal Likelihood

We integrate out θ from the likelihood equation by exploiting the fact that we can identify the probability distribution of θ conditional on β.Which is the better methodology to maximize and why?

**Answer**

Each of these will give different results with a different interpretation. The first finds the pair β,θ which is most probable, while the second finds the β which is (marginally) most probable. Imagine that your distribution looks like this:

` ``β=1``β=2`

`θ=1``0.0 ``0.2 `

`θ=2``0.1 ``0.2 `

`θ=3``0.3 ``0.2 `

Then the maximum likelihood answer is β=1 (θ=3), while the maximum marginal likelihood answer is β=2 (since, marginalizing over θ, P(β=2)=0.6).

I’d say that in general, the marginal likelihood is often what you want – if you really don’t care about the values of the θ parameters, then you should just collapse over them. But probably in practice these methods will not yield very different results – if they do, then it may point to some underlying instability in your solution, e.g. multiple modes with different combinations of β,θ that all give similar predictions.

**Attribution***Source : Link , Question Author : Ankit Chiplunkar , Answer Author : Chris*