# [R] Coding columns for survival analysis

Alexander Shenkin ashenkin at ufl.edu
Mon Apr 16 18:19:22 CEST 2012

```Jim,

This was very helpful - thank you!  I really like the use of diff and
cumsum - those haven't been in my toolkit until now.  Your solution came
close, but I needed to keep "NAs" when the tree hadn't been found yet,
or when it had already died.  So, for posterity, here's the code I ended
up with:

x <- read.table(text = "   tree live1 live2 live3 live4 live5
1 tree1     0     0     0     1     1
2 tree2     0     0     1     1     0
3 tree3     0     1     1     0     0
4 tree4     1     1     0     0     0
5 tree4     1     1     1     1     0  # another test condition
6 tree5     1     0     0     0     0", header = TRUE)

# get matrix of data columns
z <- as.matrix(x[, -1])
# process each row
a <- t( apply(z, 1, function(.row) {
.row[is.na(.row)] = 0 # replace NAs with 0's so that
diff works correctly - not a problem in this example, but it is in the
real data
diffs = diff(c(0, .row))
alive <- .row
found <- diffs == 1
die <- diffs == -1
statevec <- rep(1,length(.row))
statevec <- statevec + alive # 2 where alive
statevec <- statevec + found # 3 where found
statevec <- statevec + die * 3 # 4 where dead
c(NA, "alive", "found", "mort")[statevec]
})
)

a

[,1]    [,2]    [,3]    [,4]    [,5]
1 NA      NA      NA      "found" "alive"
2 NA      NA      "found" "alive" "mort"
3 NA      "found" "alive" "mort"  NA
4 "found" "alive" "mort"  NA      NA
5 "found" "alive" "alive" "alive" "mort"
6 "found" "mort"  NA      NA      NA

Best,
Allie

On 4/13/2012 7:01 PM, jim holtman wrote:
> try this:
>
>> x <- read.table(text = "   tree live1 live2 live3 live4 live5
> +    1 tree1     0     0     0     1     1
> +    2 tree2     0     0     1     1     0
> +    3 tree3     0     1     1     0     0
> +    4 tree4     1     1     0     0     0
> +    6 tree4     1     1     1     1     0  # another test condition
> +    5 tree5     1     0     0     0     0", header = TRUE)
>>
>> # get matrix of data columns
>> z <- as.matrix(x[, -1])
>> # process each row
>> a <- apply(z, 1, function(.row){
> +     # determine where found (will be a 2)
> +     found <- pmin(cumsum(.row) + 1, 3) # cannot be greater than 3
> +     # determined where it died
> +     die <- cumsum(diff(c(0, .row)) != 0)
> +     # replace value at die == 2 with 4
> +     found[die == 2] <- 4
> +     c(NA, "found", "alive", "mort")[found]
> + })
>> t(a)  # result
>   [,1]    [,2]    [,3]    [,4]    [,5]
> 1 NA      NA      NA      "found" "alive"
> 2 NA      NA      "found" "alive" "mort"
> 3 NA      "found" "alive" "mort"  "mort"
> 4 "found" "alive" "mort"  "mort"  "mort"
> 6 "found" "alive" "alive" "alive" "mort"
> 5 "found" "mort"  "mort"  "mort"  "mort"
>>
>
>
> On Fri, Apr 13, 2012 at 4:53 PM, Alexander Shenkin <ashenkin at ufl.edu> wrote:
>> Hello Folks,
>>
>> I have 5 columns for thousands of tree records that record whether that
>> tree was alive or dead.  I want to recode the columns such that the cell
>> reads "found" when a live tree is first observed, "alive" for when a
>> tree is found alive and is not just found, and "mort" when it was
>> previously alive but is now dead.
>>
>> Given the following:
>>
>>    > tree_live = data.frame(tree =
>> c("tree1","tree2","tree3","tree4","tree5"), live1 = c(0,0,0,1,1), live2
>> = c(0,0,1,1,0), live3 = c(0,1,1,0,0), live4 = c(1,1,0,0,0), live5 = c(1,
>> 0, 0, 0, 0))
>>
>>       tree live1 live2 live3 live4 live5
>>    1 tree1     0     0     0     1     1
>>    2 tree2     0     0     1     1     0
>>    3 tree3     0     1     1     0     0
>>    4 tree4     1     1     0     0     0
>>    5 tree5     1     0     0     0     0
>>
>> I would like to end up with the following:
>>
>>    > tree_live_recode
>>
>>      live1 live2 live3 live4 live5
>>    1    NA    NA    NA found alive
>>    2    NA    NA found alive  mort
>>    3    NA found alive  mort     0
>>    4 found alive  mort     0     0
>>    5 found  mort     0     0     0
>>
>> I've accomplished the recode in the past, but only by going over the
>> dataset multiple times in messy and inefficient fashion.  I'm wondering
>> if there are concise and efficient ways of going about it?
>>
>> (I haven't been using the Survival package for my analyses, but I'm
>> starting to look into it.)
>>
>> Thanks,
>> Allie
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help