I have been trying to discern what exactly the “coef” and “(exp)coef” output of coxph signify. It seems that the “(exp)coef” are comparisons of the first variable in the model according to the group assigned in the command.

How does the coxph function arrive at the values for “coef” and “(exp)coef”?

Additionally, how does coxph determine these values when there is censoring involved?

**Answer**

If you have a single explanatory variable, say treatment group, a Cox’s regression model is fitted with `coxph()`

; the coefficient (`coef`

) reads as a regression coefficient (in the context of the Cox model, described hereafter) and its exponential gives you the hazard in the treatment group (compared to the control or placebo group). For example, if ˆβ=−1.80, then the hazard is exp(−1.80)=0.165, that is 16.5%.

As you may know, the hazard function is modeled as

h(t)=h0(t)exp(β′x)

where h0(t) is the baseline hazard. The hazards depend multiplicatively on the covariates, and exp(β1) is the ratio of the hazards between two individuals whose values of x1 differ by one unit when all other covariates are held constant. The ratio of the hazards of any two individuals i and j is exp(β′(xi−xj)), and is called the hazard ratio (or incidence rate ratio). This ratio is assumed to be constant over time, hence the name of *proportional hazard*.

To echo your preceding question about `survreg`

, here the form of h0(t) is left unspecified; more precisely, this is a semi-parametric model in that only the effects of covariates are parametrized, and not the hazard function. In other words, we don’t make any distribution assumption about survival times.

The regression parameters are estimated by maximizing the partial log-likelihood defined by

ℓ=∑flog(exp(β′xf)∑r(f)exp(β′xr))

where the first summation is over all deaths or failures f, and the second summation is over all subjects r(f) still alive (but at risk) at the time of failure — this is known as the *risk set*. In other words, ℓ can be interpreted as the log profile likelihood for β after eliminating h0(t) (or in other words, the LL where the h0(t) have been replaced by functions of β that maximize the likelihood with respect to h0(t) for a fixed vector β).

About censoring, it is not clear whether you refer to left censoring (as might be the case if we consider an origin for the time scale that is earlier than the time when observation began, also called *delayed entry*), or right-censoring. In any case, more details about the computation of the regression coefficients and how the survival package handles censoring can be found in Therneau and Grambsch, Modeling Survival Data (Springer, 2000). Terry Therneau is the author of the former S package. An online tutorial is available.

Survival Analysis in R, by David Diez, provides a good introduction to Survival Analysis in R. A brief overview of χ2 tests for regression parameters is given p. 10. Hopefully, this should help clarifying the on-line help quoted by @onestop, “coefficients the coefficients of the linear predictor, which multiply the columns of the model matrix.” For an applied textbook, I recommend Analyzing Medical Data Using S-PLUS, by Everitt and Rabe-Hesketh (Springer, 2001, chap. 16 and 17), from which most of the above comes from.

Another useful reference is John Fox’s appendix on Cox Proportional-Hazards Regression for Survival Data.

**Attribution***Source : Link , Question Author : annemphillip , Answer Author : Community*