# What is the difference between the “coef” and “(exp)coef” output of coxph in R?

I have been trying to discern what exactly the “coef” and “(exp)coef” output of coxph signify. It seems that the “(exp)coef” are comparisons of the first variable in the model according to the group assigned in the command.

How does the coxph function arrive at the values for “coef” and “(exp)coef”?

Additionally, how does coxph determine these values when there is censoring involved?

If you have a single explanatory variable, say treatment group, a Cox’s regression model is fitted with coxph(); the coefficient (coef) reads as a regression coefficient (in the context of the Cox model, described hereafter) and its exponential gives you the hazard in the treatment group (compared to the control or placebo group). For example, if $\hat\beta=-1.80$, then the hazard is $\exp(-1.80)=0.165$, that is 16.5%.

As you may know, the hazard function is modeled as

where $h_0(t)$ is the baseline hazard. The hazards depend multiplicatively on the covariates, and $\exp(\beta_1)$ is the ratio of the hazards between two individuals whose values of $x_1$ differ by one unit when all other covariates are held constant. The ratio of the hazards of any two individuals $i$ and $j$ is $\exp\big(\beta'(x_i-x_j)\big)$, and is called the hazard ratio (or incidence rate ratio). This ratio is assumed to be constant over time, hence the name of proportional hazard.

To echo your preceding question about survreg, here the form of $h_0(t)$ is left unspecified; more precisely, this is a semi-parametric model in that only the effects of covariates are parametrized, and not the hazard function. In other words, we don’t make any distribution assumption about survival times.

The regression parameters are estimated by maximizing the partial log-likelihood defined by

where the first summation is over all deaths or failures $f$, and the second summation is over all subjects $r(f)$ still alive (but at risk) at the time of failure — this is known as the risk set. In other words, $\ell$ can be interpreted as the log profile likelihood for $\beta$ after eliminating $h_0(t)$ (or in other words, the LL where the $h_0(t)$ have been replaced by functions of $\beta$ that maximize the likelihood with respect to $h_0(t)$ for a fixed vector $\beta$).

About censoring, it is not clear whether you refer to left censoring (as might be the case if we consider an origin for the time scale that is earlier than the time when observation began, also called delayed entry), or right-censoring. In any case, more details about the computation of the regression coefficients and how the survival package handles censoring can be found in Therneau and Grambsch, Modeling Survival Data (Springer, 2000). Terry Therneau is the author of the former S package. An online tutorial is available.

Survival Analysis in R, by David Diez, provides a good introduction to Survival Analysis in R. A brief overview of $\chi^2$ tests for regression parameters is given p. 10. Hopefully, this should help clarifying the on-line help quoted by @onestop, “coefficients the coefficients of the linear predictor, which multiply the columns of the model matrix.” For an applied textbook, I recommend Analyzing Medical Data Using S-PLUS, by Everitt and Rabe-Hesketh (Springer, 2001, chap. 16 and 17), from which most of the above comes from.
Another useful reference is John Fox’s appendix on Cox Proportional-Hazards Regression for Survival Data.