Can someone explain my Cox model to me in plain English?

I fitted the following Cox regression model to

allof my data using the`cph`

function. My data are saved in an object called`Data`

. The variables`w`

,`x`

, and`y`

are continuous;`z`

is a factor of two levels. Time is measured in months. Some of my patients are missing data for variable`z`

(NB: I have duly noted Dr. Harrell’s suggestion, below, that I impute these values so as to avoid biasing my model, and will do so in the future).`> fit <- cph(formula = Surv(time, event) ~ w + x + y + z, data = Data, x = T, y = T, surv = T, time.inc = 12) Cox Proportional Hazards Model Frequencies of Missing Values Due to Each Variable Surv(time, event) w x y z 0 0 0 0 14 Model Tests Discrimination Indexes Obs 152 LR chi2 8.33 R2 0.054 Events 64 d.f. 4 g 0.437 Center 0.7261 Pr(> chi2) 0.0803 gr 1.548 Score chi2 8.07 Pr(> chi2) 0.0891 Coef S.E. Wald Z Pr(>|Z|) w -0.0133 0.0503 -0.26 0.7914 x -0.0388 0.0351 -1.11 0.2679 y -0.0363 0.0491 -0.74 0.4600 z=1 0.3208 0.2540 1.26 0.2067`

I also tried to test the assumption of proportional hazards by using the

`cox.zph`

command, below, but do not know how to interpret its results. Putting`plot()`

around the command gives an error message.`cox.zph(fit, transform="km", global=TRUE) rho chisq p w -0.1125 1.312 0.2520 x 0.0402 0.179 0.6725 y 0.2349 4.527 0.0334 z=1 0.0906 0.512 0.4742 GLOBAL NA 5.558 0.2347`

## First Problem

- Can someone explain the results of the above output to me in plain English? I have a medical background and no formal training in statistics.
## Second Problem

As suggested by Dr. Harrell, I would like to internally validate my model by performing 100 iterations of 10-fold cross-validation using the

`rms`

package (from what I understand, this would entail building`100 * 10 = 1000`

different models and then asking them to predict the survival times of patients that they had never seen).I tried using the

`validate`

function, as shown.`> v1 <- validate(fit, method="crossvalidation", B = 10, dxy=T) > v1 index.orig training test optimism index.corrected n Dxy -0.2542 -0.2578 -0.1356 -0.1223 -0.1320 10 R2 0.0543 0.0565 0.1372 -0.0806 0.1350 10 Slope 1.0000 1.0000 0.9107 0.0893 0.9107 10 D 0.0122 0.0128 0.0404 -0.0276 0.0397 10 U -0.0033 -0.0038 0.0873 -0.0911 0.0878 10 Q 0.0155 0.0166 -0.0470 0.0636 -0.0481 10 g 0.4369 0.4424 0.6754 -0.2331 0.6700 10`

How do you perform the 100x resampling? I think my above code only performs the cross-validation once.

I then wanted to know how good my model was at prediction. I tried the following:

`> c_index <- abs(v1[1,5])/2 + 0.5 > c_index [1] 0.565984`

Does this mean that my model is only very slightly better than flipping a coin?

## Third Problem

Dr. Harrell points out that I have assumed linearity for the covariate effects, and that the number of events in my sample is just barely large enough to fit a reliable model if all covariate effects happen to be linear.

- Does this mean that I should include some sort of interaction term in my model? If so, any advice as to what to put?

**Answer**

To get started, consider a few things. First, you are excluding too many observations with missing data and this will cause a bias. Consider multiple imputation. Second, there is a plot method for `cox.zph`

which is useful in assessing proportional hazards. Third, you have assumed linearity for the covariate effects. Fourth, the number of events in your training sample is just barely large enough to fit a reliable model if all covariate effects happen to be linear (which is rare). And your test sample would have to have perhaps 400 events before it would yield a reliable assessment of prediction accuracy. It is not clear that you had enough data to split the data into two parts. Resampling validation (100 repeats of 10-fold cross-validation, or use the bootstrap) is a better solution. Both your original external validation (functions `rcorr.cens`

and `val.surv`

) and resampling internal validation (functions `validate`

, `calibrate`

) are implemented in the R `rms`

package. Case studies for the `rms`

package are found in my course notes at http://biostat.mc.vanderbilt.edu/rms (and I have a 3-day course on this in Nashville next month). Note that $2\times 2$ tables are not appropriate for use with continuous data.

**Attribution***Source : Link , Question Author : Alexander , Answer Author : Frank Harrell*