[R] Create a time interval from a single time variable

David Winsemius dwinsemius at comcast.net
Wed Jun 3 22:15:18 CEST 2009


On Jun 3, 2009, at 3:58 PM, David Winsemius wrote:

> Not sure how to functionalize it either. What seemed promising  
> (assuming it has first been sorted by ID and DaysEnrolled) would be  
> three-step process:
>
> sample1$Stop <-  c(sample1[2:nrow(sample1),"DaysEnrolled"],NA)  #  
> shift the DaysEnrolled
> sample1$next.id <-  c(sample1[2:nrow(sample1),"ID"],NA)         #  
> shift ID
> is.na(sample1$Start) <-   with(sample1, ID != next.id)          # NA  
> the ends of ID groups

Rather:
is.na(sample1$Stop) <-   with(sample1, ID != next.id)          # NA  
the ends of ID groups

(Start should be just a copy of DaysEnrolled.)

>
>
> -- 
> David winsemius
>
>
> On Jun 3, 2009, at 2:15 PM, Katschke, Adrian R wrote:
>
>> I am trying to set up a data set for a survival analysis with time- 
>> varying covariates. The data is already in a long format, but does  
>> not have a variable to signify the stopping point for the interval.  
>> The variable DaysEnrolled is the variable I would like to use to  
>> form this interval. This is what I have now:
>>
>>    ID   Age DaysEnrolled HAZ WAZ WHZ Food  
>> onARV                         HIVStatus LTFUp Start Stop
>> 1 71622 0.008            0  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    0
>> 2 71622 0.085           28  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    0
>> 3 71622 0.123           42  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    0
>> 4 71622 0.277           98  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    0
>> 5 71622 0.441          158  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    0
>> 6 71622 0.517          186  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    0
>> 7 71622 0.594          214  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    0
>> 8 71622 0.715          258  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    0
>> 9 71622 0.791          286  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    0
>>
>> This is what I would like to have:
>>
>>    ID   Age DaysEnrolled HAZ WAZ WHZ Food  
>> onARV                         HIVStatus LTFUp Start Stop
>> 1 71622 0.008            0  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0     0    28
>> 2 71622 0.085           28  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0    28    42
>> 3 71622 0.123           42  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0    42    98
>> 4 71622 0.277           98  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0    98    158
>> 5 71622 0.441          158  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0    158   186
>> 6 71622 0.517          186  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0    186   214
>> 7 71622 0.594          214  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0    214   258
>> 8 71622 0.715          258  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0    258   286
>> 9 71622 0.791          286  NA  NA  NA   NA     0 HIV exposed,  
>> status indeterminate     0    286    NA
>>
>> I am not sure how to put this in a function. I thought of using  
>> embed() in tapply().
>>
>> astop <- tapply(sample1$DaysEnrolled, sample1$ID, function(x){
>>                                                         
>> ifelse(length(x) == 1,
>>                                                        embed(x,1),  
>> ifelse(length(x) > 1,
>>                                                        embed(x,2),  
>> NA))})
>>
>> This doesn't do what I thought it would. I know that I could write  
>> a double loop to look at each subject and the differing number of  
>> observations for each subject, but would like to avoid that it at  
>> all possible.
>>
>>
>> Sample of 2 subjects:
>>          sample1 <-
>> structure(list(ID = c(71622L, 71622L, 71622L, 71622L, 71622L,
>> 71622L, 71622L, 71622L, 71622L, 1436L), Age = c(0.008, 0.085,
>> 0.123, 0.277, 0.441, 0.517, 0.594, 0.715, 0.791, 6.968),  
>> DaysEnrolled = c(0L,
>> 28L, 42L, 98L, 158L, 186L, 214L, 258L, 286L, 0L), HAZ = c(NA_real_,
>> NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
>> NA_real_, NA_real_), WAZ = c(NA_real_, NA_real_, NA_real_, NA_real_,
>> NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
>>    WHZ = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
>>    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Food =  
>> c(NA_integer_,
>>    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>>    NA_integer_, NA_integer_, NA_integer_, NA_integer_), onARV = c(0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), HIVStatus = structure(c(2L,
>>    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "HIV  
>> exposed, status indeterminate",
>>    "HIV infected", "HIV negative"), class = "factor"), LTFUp = c(0,
>>    0, 0, 0, 0, 0, 0, 0, 0, NA), Start = c(0, 0, 0, 0, 0, 0,
>>    0, 0, 0, 0), Stop = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =  
>> c("ID",
>> "Age", "DaysEnrolled", "HAZ", "WAZ", "WHZ", "Food", "onARV",
>> "HIVStatus", "LTFUp", "Start", "Stop"), row.names = c(NA, 10L
>> ), class = "data.frame")
>>
>>
>> Adrian Katschke
>> Biostatistician
>> IU Department of Medicine
>> Division of Biostatistics
>> akatschk at iupui.edu
>> 317-278-6665
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list