[R] Create a time interval from a single time variable
David Winsemius
dwinsemius at comcast.net
Wed Jun 3 21:58:33 CEST 2009
Not sure how to functionalize it either. What seemed promising
(assuming it has first been sorted by ID and DaysEnrolled) would be
three-step process:
sample1$Stop <- c(sample1[2:nrow(sample1),"DaysEnrolled"],NA) #
shift the DaysEnrolled
sample1$next.id <- c(sample1[2:nrow(sample1),"ID"],NA) #
shift ID
is.na(sample1$Start) <- with(sample1, ID != next.id) # NA
the ends of ID groups
--
David winsemius
On Jun 3, 2009, at 2:15 PM, Katschke, Adrian R wrote:
> I am trying to set up a data set for a survival analysis with time-
> varying covariates. The data is already in a long format, but does
> not have a variable to signify the stopping point for the interval.
> The variable DaysEnrolled is the variable I would like to use to
> form this interval. This is what I have now:
>
> ID Age DaysEnrolled HAZ WAZ WHZ Food
> onARV HIVStatus LTFUp Start Stop
> 1 71622 0.008 0 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 0
> 2 71622 0.085 28 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 0
> 3 71622 0.123 42 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 0
> 4 71622 0.277 98 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 0
> 5 71622 0.441 158 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 0
> 6 71622 0.517 186 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 0
> 7 71622 0.594 214 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 0
> 8 71622 0.715 258 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 0
> 9 71622 0.791 286 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 0
>
> This is what I would like to have:
>
> ID Age DaysEnrolled HAZ WAZ WHZ Food
> onARV HIVStatus LTFUp Start Stop
> 1 71622 0.008 0 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 0 28
> 2 71622 0.085 28 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 28 42
> 3 71622 0.123 42 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 42 98
> 4 71622 0.277 98 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 98 158
> 5 71622 0.441 158 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 158 186
> 6 71622 0.517 186 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 186 214
> 7 71622 0.594 214 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 214 258
> 8 71622 0.715 258 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 258 286
> 9 71622 0.791 286 NA NA NA NA 0 HIV exposed,
> status indeterminate 0 286 NA
>
> I am not sure how to put this in a function. I thought of using
> embed() in tapply().
>
> astop <- tapply(sample1$DaysEnrolled, sample1$ID, function(x){
>
> ifelse(length(x) == 1,
> embed(x,1),
> ifelse(length(x) > 1,
> embed(x,2),
> NA))})
>
> This doesn't do what I thought it would. I know that I could write a
> double loop to look at each subject and the differing number of
> observations for each subject, but would like to avoid that it at
> all possible.
>
>
> Sample of 2 subjects:
> sample1 <-
> structure(list(ID = c(71622L, 71622L, 71622L, 71622L, 71622L,
> 71622L, 71622L, 71622L, 71622L, 1436L), Age = c(0.008, 0.085,
> 0.123, 0.277, 0.441, 0.517, 0.594, 0.715, 0.791, 6.968),
> DaysEnrolled = c(0L,
> 28L, 42L, 98L, 158L, 186L, 214L, 258L, 286L, 0L), HAZ = c(NA_real_,
> NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
> NA_real_, NA_real_), WAZ = c(NA_real_, NA_real_, NA_real_, NA_real_,
> NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
> WHZ = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
> NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Food =
> c(NA_integer_,
> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
> NA_integer_, NA_integer_, NA_integer_, NA_integer_), onARV = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), HIVStatus = structure(c(2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "HIV
> exposed, status indeterminate",
> "HIV infected", "HIV negative"), class = "factor"), LTFUp = c(0,
> 0, 0, 0, 0, 0, 0, 0, 0, NA), Start = c(0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0), Stop = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =
> c("ID",
> "Age", "DaysEnrolled", "HAZ", "WAZ", "WHZ", "Food", "onARV",
> "HIVStatus", "LTFUp", "Start", "Stop"), row.names = c(NA, 10L
> ), class = "data.frame")
>
>
> Adrian Katschke
> Biostatistician
> IU Department of Medicine
> Division of Biostatistics
> akatschk at iupui.edu
> 317-278-6665
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list