[R] R_closest date

arun smartpink111 at yahoo.com
Sun Sep 2 18:12:30 CEST 2012


Hi,
No problem.

If you use join() instead of merge(), the original order of columns may not get altered.

dat3<-aggregate(DAYS_DIFF~PT_ID,data=dat1,min)
library(plyr)
 join(dat1,dat3,type="inner")
#Joining by: PT_ID, DAYS_DIFF
 # PT_ID     IDX_DT   OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY
#1  4549 2002-08-21 2002-08-20        -1       183        2
#2  4839 2006-11-28 2006-11-28         0       179        2
A.K.






________________________________
From: Weijia Wang <wwang.nyu at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Saturday, September 1, 2012 5:11 PM
Subject: Re: [R] R_closest date


Thank you Arun, for your help again.

Best
______________________________
WANG WEIJIA
Graudate Research and Teaching Assistant
Department of Environmental Medicine
New York University, School of Medicine
wwang.nyu at gmail.com




On Sep 1, 2012, at 5:04 PM, arun <smartpink111 at yahoo.com> wrote:

Hi,
>Try this:
>dat1 <- read.table(text="
>  PT_ID    IDX_DT  OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY
>13  4549 2002-08-21 2002-08-20        -1      183        2
>14  4549 2002-08-21 2002-11-14        85        91        1
>15  4549 2002-08-21 2003-02-18      181        89        1
>16  4549 2002-08-21 2003-05-15      267      109        2
>17  4549 2002-08-21 2003-12-16      482        96        1
>128  4839 2006-11-28 2006-11-28        0      179        2
>", header=TRUE)
>dat3<-aggregate(DAYS_DIFF~PT_ID,data=dat1,min)
>merge(dat1,dat3)
>#  PT_ID DAYS_DIFF     IDX_DT   OBS_DATE OBS_VALUE CATEGORY
>#1  4549        -1 2002-08-21 2002-08-20       183        2
>#2  4839         0 2006-11-28 2006-11-28       179        2
>
>#or,
>dat2<- tapply(dat1$DAYS_DIFF,dat1$PT_ID,min)
>dat4<-data.frame(PT_ID=row.names(data.frame(dat2)),DAYS_DIFF=dat2)
> row.names(dat4)<-1:nrow(dat4)
>merge(dat1,dat4)
>#  PT_ID DAYS_DIFF     IDX_DT   OBS_DATE OBS_VALUE CATEGORY
>#1  4549        -1 2002-08-21 2002-08-20       183        2
>#2  4839         0 2006-11-28 2006-11-28       179        2
>A.K.
>
>
>
>
>
>----- Original Message -----
>From: WANG WEIJIA <wwang.nyu at gmail.com>
>To: "r-help at R-project.org" <r-help at r-project.org>
>Cc: 
>Sent: Saturday, September 1, 2012 1:10 PM
>Subject: [R] R_closest date
>
>Hi, 
>
>I have encountered an issue about finding a date closest to another date
>
>So this is how the data frame looks like:
>
>    PT_ID     IDX_DT   OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY
>13   4549 2002-08-21 2002-08-20        -1       183        2
>14   4549 2002-08-21 2002-11-14        85        91        1
>15   4549 2002-08-21 2003-02-18       181        89        1
>16   4549 2002-08-21 2003-05-15       267       109        2
>17   4549 2002-08-21 2003-12-16       482        96        1
>128  4839 2006-11-28 2006-11-28         0       179        2
>
>I need to find, the single observation, which has the closest date of 'OBS_DATE' to 'IDX_DT'.
>
>For example, for 'PT_ID' of 4549, I need row 13, of which the OBS_DATE is just one day away from IDX_DT. 
>
>I was thinking about using abs(), and I got this:
>
>baseline<- function(x){
>>+  #remove all uncessary variables
>+  baseline<- x[,c("PT_ID","DAYS_DIFF")]
>>+  #get a list of every unique ID
>+  uniqueID <- unique(baseline$PT_ID)
>>+  #make a vector that will contain the smallest DAYS_DIFF
>+  first <- rep(-99,length(uniqueID))
>>+  i = 1
>+  #loop through each unique ID
>+  for (PT_ID in uniqueID){
>>+  #for each iteration get the smallest DAYS_DIFF for that ID
>+  first[i] <- min(baseline[which(baseline$PT_ID==PT_ID),abs(baseline$DAYS_DIFF)])
>>+  #up the iteration counter
>+  i = i + 1
>>+  }
>+  #make a data frame with the lowest DAYS_DIFF and ID
>+  newdata <- data.frame(uniqueID,first)
>+  names(newdata) <- c("PT_ID","DAYS_DIFF")
>>+  #return the data frame containing the lowest GPI for each ID
>+  return(newdata)
>+  }
>
>ldl.b<-baseline(ldl) #get all baseline ldl patient ID, total 11368 obs, all unique#
>>Error in `[.data.frame`(baseline, which(baseline$PT_ID == PT_ID), abs(baseline$DAYS_DIFF)) : 
>  undefined columns selected
>
>Can anyone help me in figuring out how to get the minimum value of the absolute value of DAYS_DIFF for unique ID?
>
>Thanks a lot
>    [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list