Usually in logistic regression, we fit a model and get some predictions on the training set. We then cross-validate on those training predictions (something like here) and decide the optimal threshold value based on something like the ROC curve.
Why don’t we incorporate cross-validation of the threshold INTO the actual model, and train the whole thing end-to-end?
A threshold isn’t trained with the model because logistic regression isn’t a classifier (cf., Why isn’t Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, p, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.