[R] Coding columns for survival analysis
Alexander Shenkin
ashenkin at ufl.edu
Mon Apr 16 18:19:22 CEST 2012
Jim,
This was very helpful - thank you! I really like the use of diff and
cumsum - those haven't been in my toolkit until now. Your solution came
close, but I needed to keep "NAs" when the tree hadn't been found yet,
or when it had already died. So, for posterity, here's the code I ended
up with:
x <- read.table(text = " tree live1 live2 live3 live4 live5
1 tree1 0 0 0 1 1
2 tree2 0 0 1 1 0
3 tree3 0 1 1 0 0
4 tree4 1 1 0 0 0
5 tree4 1 1 1 1 0 # another test condition
6 tree5 1 0 0 0 0", header = TRUE)
# get matrix of data columns
z <- as.matrix(x[, -1])
# process each row
a <- t( apply(z, 1, function(.row) {
.row[is.na(.row)] = 0 # replace NAs with 0's so that
diff works correctly - not a problem in this example, but it is in the
real data
diffs = diff(c(0, .row))
alive <- .row
found <- diffs == 1
die <- diffs == -1
statevec <- rep(1,length(.row))
statevec <- statevec + alive # 2 where alive
statevec <- statevec + found # 3 where found
statevec <- statevec + die * 3 # 4 where dead
c(NA, "alive", "found", "mort")[statevec]
})
)
a
[,1] [,2] [,3] [,4] [,5]
1 NA NA NA "found" "alive"
2 NA NA "found" "alive" "mort"
3 NA "found" "alive" "mort" NA
4 "found" "alive" "mort" NA NA
5 "found" "alive" "alive" "alive" "mort"
6 "found" "mort" NA NA NA
Best,
Allie
On 4/13/2012 7:01 PM, jim holtman wrote:
> try this:
>
>> x <- read.table(text = " tree live1 live2 live3 live4 live5
> + 1 tree1 0 0 0 1 1
> + 2 tree2 0 0 1 1 0
> + 3 tree3 0 1 1 0 0
> + 4 tree4 1 1 0 0 0
> + 6 tree4 1 1 1 1 0 # another test condition
> + 5 tree5 1 0 0 0 0", header = TRUE)
>>
>> # get matrix of data columns
>> z <- as.matrix(x[, -1])
>> # process each row
>> a <- apply(z, 1, function(.row){
> + # determine where found (will be a 2)
> + found <- pmin(cumsum(.row) + 1, 3) # cannot be greater than 3
> + # determined where it died
> + die <- cumsum(diff(c(0, .row)) != 0)
> + # replace value at die == 2 with 4
> + found[die == 2] <- 4
> + c(NA, "found", "alive", "mort")[found]
> + })
>> t(a) # result
> [,1] [,2] [,3] [,4] [,5]
> 1 NA NA NA "found" "alive"
> 2 NA NA "found" "alive" "mort"
> 3 NA "found" "alive" "mort" "mort"
> 4 "found" "alive" "mort" "mort" "mort"
> 6 "found" "alive" "alive" "alive" "mort"
> 5 "found" "mort" "mort" "mort" "mort"
>>
>
>
> On Fri, Apr 13, 2012 at 4:53 PM, Alexander Shenkin <ashenkin at ufl.edu> wrote:
>> Hello Folks,
>>
>> I have 5 columns for thousands of tree records that record whether that
>> tree was alive or dead. I want to recode the columns such that the cell
>> reads "found" when a live tree is first observed, "alive" for when a
>> tree is found alive and is not just found, and "mort" when it was
>> previously alive but is now dead.
>>
>> Given the following:
>>
>> > tree_live = data.frame(tree =
>> c("tree1","tree2","tree3","tree4","tree5"), live1 = c(0,0,0,1,1), live2
>> = c(0,0,1,1,0), live3 = c(0,1,1,0,0), live4 = c(1,1,0,0,0), live5 = c(1,
>> 0, 0, 0, 0))
>>
>> tree live1 live2 live3 live4 live5
>> 1 tree1 0 0 0 1 1
>> 2 tree2 0 0 1 1 0
>> 3 tree3 0 1 1 0 0
>> 4 tree4 1 1 0 0 0
>> 5 tree5 1 0 0 0 0
>>
>> I would like to end up with the following:
>>
>> > tree_live_recode
>>
>> live1 live2 live3 live4 live5
>> 1 NA NA NA found alive
>> 2 NA NA found alive mort
>> 3 NA found alive mort 0
>> 4 found alive mort 0 0
>> 5 found mort 0 0 0
>>
>> I've accomplished the recode in the past, but only by going over the
>> dataset multiple times in messy and inefficient fashion. I'm wondering
>> if there are concise and efficient ways of going about it?
>>
>> (I haven't been using the Survival package for my analyses, but I'm
>> starting to look into it.)
>>
>> Thanks,
>> Allie
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
More information about the R-help
mailing list