I wish to create a toy survival (time to event) data which is right censored and follows some distribution with proportional hazards and constant baseline hazard.
I created the data as follows, but I am unable to obtain estimated hazard ratios that are close to the true values after fitting a Cox proportional hazards model to the simulated data.
What did I do wrong?
R codes:
library(survival) #set parameters set.seed(1234) n = 40000 #sample size #functional relationship lambda=0.000020 #constant baseline hazard 2 per 100000 per 1 unit time b_haz <-function(t) #baseline hazard { lambda #constant hazard wrt time } x = cbind(hba1c=rnorm(n,2,.5)-2,age=rnorm(n,40,5)-40,duration=rnorm(n,10,2)-10) B = c(1.1,1.2,1.3) # hazard ratios (model coefficients) hist(x %*% B) #distribution of scores haz <-function(t) #hazard function { b_haz(t) * exp(x %*% B) } c_hf <-function(t) #cumulative hazards function { exp(x %*% B) * lambda * t } S <- function(t) #survival function { exp(-c_hf(t)) } S(.005) S(1) S(5) #simulate censoring time = rnorm(n,10,2) S_prob = S(time) #simulate events event = ifelse(runif(1)>S_prob,1,0) #model fit km = survfit(Surv(time,event)~1,data=data.frame(x)) plot(km) #kaplan-meier plot #Cox PH model fit = coxph(Surv(time,event)~ hba1c+age+duration, data=data.frame(x)) summary(fit) cox.zph(fit)
Results:
Call: coxph(formula = Surv(time, event) ~ hba1c + age + duration, data = data.frame(x)) n= 40000, number of events= 3043 coef exp(coef) se(coef) z Pr(>|z|) hba1c 0.236479 1.266780 0.035612 6.64 3.13e-11 *** age 0.351304 1.420919 0.003792 92.63 < 2e-16 *** duration 0.356629 1.428506 0.008952 39.84 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 hba1c 1.267 0.7894 1.181 1.358 age 1.421 0.7038 1.410 1.432 duration 1.429 0.7000 1.404 1.454 Concordance= 0.964 (se = 0.006 ) Rsquare= 0.239 (max possible= 0.767 ) Likelihood ratio test= 10926 on 3 df, p=0 Wald test = 10568 on 3 df, p=0 Score (logrank) test = 11041 on 3 df, p=0
but true values are set as
B = c(1.1,1.2,1.3) # hazard ratios (model coefficients)
Answer
It is not clear to me how you generate your event times (which, in your case, might be $<0$) and event indicators:
time = rnorm(n,10,2)
S_prob = S(time)
event = ifelse(runif(1)>S_prob,1,0)
So here is a generic method, followed by some R code.
Generating survival times to simulate Cox proportional hazards models
To generate event times from the proportional hazards model, we can use the inverse probability method (Bender et al., 2005): if $V$ is uniform on $(0, 1)$ and if $S(\cdot \,|\, \mathbf{x})$ is the conditional survival function derived from the proportional hazards model, i.e.
$$
S(t \,|\, \mathbf{x}) = \exp \left( -H_0(t) \exp(\mathbf{x}^\prime \mathbf{\beta}) \vphantom{\Big(} \right)
$$
then it is a fact that the random variable
$$
T = S^{-1}(V \,|\, \mathbf{x}) = H_0^{-1} \left( – \frac{\log(V)}{\exp(\mathbf{x}^\prime \mathbf{\beta})} \right)
$$
has survival function $S(\cdot \,|\, \mathbf{x})$. This result is known as “the inverse probability integral transformation”. Therefore, to generate a survival time $T \sim S(\cdot \,|\, \mathbf{x})$ given the covariate vector, it suffices to draw $v$ from $V \sim \mathrm{U}(0, 1)$ and to make the inverse transformation $t = S^{-1}(v \,|\, \mathbf{x})$.
Example [Weibull baseline hazard]
Let $h_0(t) = \lambda \rho t^{\rho – 1}$ with shape $\rho > 0$ and scale $\lambda > 0$. Then $H_0(t) = \lambda t^\rho$ and $H^{-1}_0(t) = (\frac{t}{\lambda})^{\frac{1}{\rho}}$. Following the inverse probability method, a realisation of $T \sim S(\cdot \,|\, \mathbf{x})$ is obtained by computing
$$
t = \left( – \frac{\log(v)}{\lambda \exp(\mathbf{x}^\prime \mathbf{\beta})} \right)^{\frac{1}{\rho}}
$$
with $v$ a uniform variate on $(0, 1)$. Using results on transformations of random variables, one may notice that $T$ has a conditional Weibull distribution (given $\mathbf{x}$) with shape $\rho$ and scale $\lambda \exp(\mathbf{x}^\prime \mathbf{\beta})$.
R code
The following R function generates a data set with a single binary covariate $x$ (e.g. a treatment indicator). The baseline hazard has a Weibull form. Censoring times are randomly drawn from an exponential distribution.
# baseline hazard: Weibull
# N = sample size
# lambda = scale parameter in h0()
# rho = shape parameter in h0()
# beta = fixed effect parameter
# rateC = rate parameter of the exponential distribution of C
simulWeib <- function(N, lambda, rho, beta, rateC)
{
# covariate --> N Bernoulli trials
x <- sample(x=c(0, 1), size=N, replace=TRUE, prob=c(0.5, 0.5))
# Weibull latent event times
v <- runif(n=N)
Tlat <- (- log(v) / (lambda * exp(x * beta)))^(1 / rho)
# censoring times
C <- rexp(n=N, rate=rateC)
# follow-up times and event indicators
time <- pmin(Tlat, C)
status <- as.numeric(Tlat <= C)
# data set
data.frame(id=1:N,
time=time,
status=status,
x=x)
}
Test
Here is some quick simulation with $\beta = -0.6$:
set.seed(1234)
betaHat <- rep(NA, 1e3)
for(k in 1:1e3)
{
dat <- simulWeib(N=100, lambda=0.01, rho=1, beta=-0.6, rateC=0.001)
fit <- coxph(Surv(time, status) ~ x, data=dat)
betaHat[k] <- fit$coef
}
> mean(betaHat)
[1] -0.6085473
Attribution
Source : Link , Question Author : stats_newb , Answer Author : ocram