[R] Data manipulation question
Peter Jepsen
PJ at DCE.AU.DK
Thu Nov 6 13:11:40 CET 2008
Thank you for your prompt assistance, cruz and Bart.
Bart set me on the right track, and I modified his proposal to this:
f <- function(data){
m <- match(data$stop,data$start)
n <- min(length(m),which(is.na(m)))
data$stop[n]
}
by(data,data$id,f)
It also handles some special cases outside my small example dataset.
Thank you again!
Peter.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of bartjoosen
Sent: 6. november 2008 11:31
To: r-help at r-project.org
Subject: Re: [R] Data manipulation question
How about:
id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
data <- data.frame(id,start,stop)
f <- function(data){
m <- match(data$start,data$stop) + 1
if (length(m)==1 && is.na(m)) m <- 1
if (length(m) > 1 && is.na(m[2])) m <- 1
data$stop[min(m,na.rm=T)]
}
by(data,data$id,f)
The if statements in the function are for some special cases, in all the
other cases the firs line will do the trick.
I would like to add that using data is a somewhat bad behavior, as this
overwrites the build in data function of R.
And I changed the way you made up the data.frame, as your method would
convert everything to factors.
Good luck
Bart
Peter Jepsen wrote:
>
> Dear R-listers,
>
> I am a relatively inexperienced R-user currently migrating from Stata.
I
> am deeply frustrated by this data manipulation question: I know how I
> could do it in Stata, but I cannot make it work in R.
>
> I have a data frame of hospitalization data where each row represents
an
> admission. I need to know when patients were first discharged, but the
> problem is that patients were sometimes transferred between hospital
> departments. In my data a transfer looks like a new admission, except
> that it has a 'start' date equal to the previous admission's 'stop'
> date.
>
> Here is an example:
>
> id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
> start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
> stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
> data <- as.data.frame(cbind(id,start,stop))
> data
> # id start stop
> # 1 a 0 6
> # 2 a 6 12
> # 3 a 17 20
> # 4 a 20 30
> # 5 b 0 1
> # 6 b 1 10
> # 7 c 0 3
> # 8 c 5 10
> # 9 c 10 11
> # 10 c 11 30
> # 11 c 50 55
> # 12 d 0 6
>
> So, what I want to end up with is this:
>
> id start stop
> a 0 12 # This patient was transferred at time 6 and discharged
at
> time 12. The admission starting at time 17 is therefore irrelevant.
> b 0 10
> c 0 3
> d 0 6
>
> I have tried tons of variations over lapply, sapply, split, for etc.,
> all to no avail.
>
> Thank you in advance for any assistance.
>
> Best regards,
> Peter Jepsen, MD.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context:
http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.htm
l
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list