[R] Data manipulation question

Peter Jepsen PJ at DCE.AU.DK
Thu Nov 6 13:11:40 CET 2008


Thank you for your prompt assistance, cruz and Bart. 

Bart set me on the right track, and I modified his proposal to this:

f <- function(data){
	m <- match(data$stop,data$start) 
	n <- min(length(m),which(is.na(m)))
	data$stop[n]
}
by(data,data$id,f)

It also handles some special cases outside my small example dataset.

Thank you again!
Peter.


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of bartjoosen
Sent: 6. november 2008 11:31
To: r-help at r-project.org
Subject: Re: [R] Data manipulation question


How about: 

id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) 
start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) 
stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) 
data <- data.frame(id,start,stop)

f <- function(data){
	m <- match(data$start,data$stop) + 1
	if (length(m)==1 && is.na(m)) m <- 1 
	if (length(m) > 1 && is.na(m[2])) m <- 1
	data$stop[min(m,na.rm=T)]
}

by(data,data$id,f)

The if statements in the function are for some special cases, in all the
other cases the firs line will do the trick.
I would like to add that using data is a somewhat bad behavior, as this
overwrites the build in data function of R.
And I changed the way you made up the data.frame, as your method would
convert everything to factors.

Good luck

Bart



Peter Jepsen wrote:
> 
> Dear R-listers,
> 
> I am a relatively inexperienced R-user currently migrating from Stata.
I
> am deeply frustrated by this data manipulation question: I know how I
> could do it in Stata, but I cannot make it work in R.
> 
> I have a data frame of hospitalization data where each row represents
an
> admission. I need to know when patients were first discharged, but the
> problem is that patients were sometimes transferred between hospital
> departments. In my data a transfer looks like a new admission, except
> that it has a 'start' date equal to the previous admission's 'stop'
> date.
> 
> Here is an example:
> 
> id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
> start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
> stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
> data <- as.data.frame(cbind(id,start,stop))
> data
> #    id start stop
> # 1   a     0    6
> # 2   a     6   12
> # 3   a    17   20
> # 4   a    20   30
> # 5   b     0    1
> # 6   b     1   10
> # 7   c     0    3
> # 8   c     5   10
> # 9   c    10   11
> # 10  c    11   30
> # 11  c    50   55
> # 12  d     0    6
> 
> So, what I want to end up with is this:
> 
> id start stop
> a  0     12   # This patient was transferred at time 6 and discharged
at
> time 12. The admission starting at time 17 is therefore irrelevant.
> b  0     10   
> c  0     3    
> d  0     6
> 
> I have tried tons of variations over lapply, sapply, split, for etc.,
> all to no avail. 
> 
> Thank you in advance for any assistance.
> 
> Best regards,
> Peter Jepsen, MD.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context:
http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.htm
l
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list