[R] Question on Data Simulation
Göran Broström
gb at stat.umu.se
Fri Apr 2 15:48:55 CEST 2004
On Thu, Apr 01, 2004 at 05:11:31PM +0200, Rau, Roland wrote:
> Hello,
>
> > -----Original Message-----
> > From: "Jingky P. Lozano" [SMTP:jlozano at apoy.upm.edu.ph]
> > Sent: Thursday, April 01, 2004 2:15 PM
> > To: r-help at stat.math.ethz.ch
> > Subject: [R] Question on Data Simulation
> >
> > Dear mailing list,
> >
> > Do you know a specific package in R where I can create an artificial
> > survival
> > data with controlled censoring condition? (For example, I want to
> > simulate a
> > data set with 5 variables and 20% of the observations are censored.)
> >
> maybe I misunderstood something but why don't you create a variable,
> often denoted by 'status' in survival analysis, where the probabilities are
> 0.2 for censoring and 0.8 for the event and add this to your other data?
> The code should like like this (for a hypothetical size of 1000
> individuals):
>
> status = sample(x=c(0,1), size=1000, replace=TRUE, prob=c(0.2,0.8))
I have an objection to this solution: If you take an uncensored sample
and regard some of the observations as censored, you actually change the
distribution of the censored observations (You are changing a genuine
observation T = t to T > t). Since I happen to be in need of this kind of
simulated data myself, I wrote the following function, which simulates
exponential censored survival times according to the proportional hazards
model (Type I censoring with common censoring time):
----------------------------------------------------------------------
TypeIsurv <- function(covar, beta, cens.prob = 0, scale = 1){
## covar = n x p matrix of covariates
## beta = p-vector of regression coefficients
## cens.prob = expected censoring fraction (0 <= cens.prob < 1).
## Output: Censored exponential survival times and censoring
## indicators for Type I censoring (common censoring time 'tc').
## Model: proportional hazards,
## h(t; covar, beta) = exp(covar %*% beta)) / scale
## Here we should have some checks of indata.
if (scale <= 0) stop("'scale' must be positive")
if (cens.prob >= 1) return(NULL)
covar <- as.matrix(covar)
n <- NROW(covar)
score <- exp(covar %*% beta)
tt <- rexp(n, score) # Uncensored survival times
if (cens.prob > 0){
## Find the common censoring time 'tc':
type1 <- function(x) mean(exp(-x * score)) - cens.prob
lower <- 0
upper <- -log(cens.prob) / min(score) + 1
tc <- uniroot(type1, c(lower, upper))$root
## Calculate event indicator and observed times:
event <- tt <= tc
tt <- pmin(tt, tc)
}else{
event <- rep(TRUE, n)
}
list(time = scale * tt, event = event, tc = scale * tc)
}
--------------------------------------------------------------------
A word of warning: Not well tested; I only noticed that the censoring
fraction seemed to be correct in some experiments.
Göran
--
Göran Broström tel: +46 90 786 5223
Department of Statistics fax: +46 90 786 6614
Umeå University http://www.stat.umu.se/egna/gb/
SE-90187 Umeå, Sweden e-mail: gb at stat.umu.se
More information about the R-help
mailing list