[R] Data manipulation question

Peter Jepsen PJ at DCE.AU.DK
Thu Nov 6 09:23:13 CET 2008

Dear R-listers,

I am a relatively inexperienced R-user currently migrating from Stata. I
am deeply frustrated by this data manipulation question: I know how I
could do it in Stata, but I cannot make it work in R.

I have a data frame of hospitalization data where each row represents an
admission. I need to know when patients were first discharged, but the
problem is that patients were sometimes transferred between hospital
departments. In my data a transfer looks like a new admission, except
that it has a 'start' date equal to the previous admission's 'stop'

Here is an example:

id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
data <- as.data.frame(cbind(id,start,stop))
#    id start stop
# 1   a     0    6
# 2   a     6   12
# 3   a    17   20
# 4   a    20   30
# 5   b     0    1
# 6   b     1   10
# 7   c     0    3
# 8   c     5   10
# 9   c    10   11
# 10  c    11   30
# 11  c    50   55
# 12  d     0    6

So, what I want to end up with is this:

id start stop
a  0     12   # This patient was transferred at time 6 and discharged at
time 12. The admission starting at time 17 is therefore irrelevant.
b  0     10   
c  0     3    
d  0     6

I have tried tons of variations over lapply, sapply, split, for etc.,
all to no avail. 

Thank you in advance for any assistance.

Best regards,
Peter Jepsen, MD.

More information about the R-help mailing list