# Does likelihood ratio test control for overfitting?

I have two nested logistic regression models, A and B. A is nested under B. Let’s say B has $K$ more features than A. B has a higher log likelihood than A. However the improved likelihood of B is due to the fact that the $K$ features easily overfit the data. If I apply the likelihood ratio test in my case, it suggests that the more complicated model, B, has a significant improvement. So I think that likelihood ratio test is flawed in such a case.

• How can we determine whether the added features cause the overfitting problem?
• Does likelihood ratio test always return the correct answer?

Given the $K$ additional features, the LR test statistic will follow an asymptotic $\chi^2$ distribution with $K$ degrees of freedom if the null is true (and other auxiliary assumptions, e.g., a suitable regression setting, weak dependence assumptions etc.), i.e., if the additional predictors in $B$ are just noise features that lead to “overfitting”.
The figure below plots the 0.95%-quantiles of the $\chi^2_K$ distribution as a function of $K$, i.e. the value that the LR statistic needs to exceed to reject the null that $A$ is the “good” model.
As you can see, higher and higher values of the test statistic are needed the larger your set in $B$ that “overfits” the data. So the test suitably makes it more difficult for the (inevitable) better fit (or log-likelihood) of the larger model to be judged “sufficiently” large to reject model $A$.