# How to do cross-validation with a Cox proportional hazards model?

Suppose I have constructed a prediction model for the occurrence of a particular disease in one dataset (the model building dataset) and now want to check how well the model works in a new dataset (the validation dataset). For a model built with logistic regression, I would calculate the predicted probability for each person in the validation dataset based on the model coefficients obtained from the model building dataset and then, after dichotomizing those probabilities at some cutoff value, I can construct a 2×2 table that allows me to calculate the true positive rate (sensitivity) and the true negative rate (specificity). Moreover, I can construct the entire ROC curve by varying the cutoff and then obtain the AUC for the ROC graph.

Now suppose that I actually have survival data. So, I used a Cox proportional hazards model in the model building dataset and now want to check how well the model works in the validation dataset. Since the baseline risk is not a parametric function in Cox models, I do not see how I can get the predicted survival probability for each person in the validation dataset based on the model coefficients obtained in the model building dataset. So, how can I go about checking how well the model works in the validation dataset? Are there established methods for doing this? And if yes, are they implemented in any software? Thanks in advance for any suggestions!

An ROC curve is not useful in this setting, although the generalized ROC area (c-index, which does not require any dichotomization at all) is. The R `rms` package will compute the c-index and cross-validated or bootstrap overfitting-corrected versions of it. You can do this without holding back any data if you fully pre-specify the model or repeat a backwards stepdown algorithm at each resample. If you truly want to do external validation, i.e., if your validation sample is enormous, you can use the following `rms` functions: `rcorr.cens`, `val.surv`.