While performing regression if we go by the definition from: What is the difference between a partial likelihood, profile likelihood and marginal likelihood?
that, Maximum Likelihood
Find β and θ that maximizes L(β, θ|data).
While, Marginal Likelihood
We integrate out θ from the likelihood equation by exploiting the fact that we can identify the probability distribution of θ conditional on β.
Which is the better methodology to maximize and why?
Each of these will give different results with a different interpretation. The first finds the pair β,θ which is most probable, while the second finds the β which is (marginally) most probable. Imagine that your distribution looks like this:
Then the maximum likelihood answer is β=1 (θ=3), while the maximum marginal likelihood answer is β=2 (since, marginalizing over θ, P(β=2)=0.6).
I’d say that in general, the marginal likelihood is often what you want – if you really don’t care about the values of the θ parameters, then you should just collapse over them. But probably in practice these methods will not yield very different results – if they do, then it may point to some underlying instability in your solution, e.g. multiple modes with different combinations of β,θ that all give similar predictions.