[R] Assign date according to defined time interval

arun smartpink111 at yahoo.com
Wed Oct 16 19:48:11 CEST 2013


HI Weijia,

Please check whether this is what you wanted.

Weijia <- load("/home/arunksa111/Downloads/arun_help.RData" )
a[sapply(a,is.factor)] <-lapply(a[sapply(a,is.factor)],as.character)
str(a)
 b[sapply(b,is.factor)] <- lapply(b[sapply(b,is.factor)],as.character)
str(b)
 b$DT_ETP <- as.Date(b$DT_ETP,"%d-%b-%y") 
 b$DT_END <- c(b$DT_ETP[-1]-1, b$DT_ETP[length(b$DT_ETP)]+6)
res <- do.call(rbind,lapply(split(a,list(a$COUNTRY,a$SITEID),drop=TRUE),function(x){
x$SCRNDT <- as.Date(x$SCRNDT, "%d-%b-%y") 
do.call(cbind,lapply(split(b,b$TMPT),function(y) {
 sum(x$SCRNDT>= y$DT_ETP & x$SCRNDT <= y$DT_ETP)
}))
 }))

 colnames(res) <- paste0("WEEK",colnames(res))



A.K.




On Wednesday, October 16, 2013 10:48 AM, Weijia Wang <zeleehom at gmail.com> wrote:

Hi, Arun

Here I attached a R object with two dataframes.

The first one is the one I need to count the number of dates that fall into a certain week interval.

For example, 

   STUDYID COUNTRY SITEID SUBJID   SCRNDT    
1 GRTMD101     USA     13        130101    4-Dec-12 
2 GRTMD101     USA     13        130102    4-Dec-12    
3 GRTMD101     USA     13        130103    4-Dec-12  
4 GRTMD101     USA      6         60101      5-Dec-12 
5 GRTMD101     USA      5         50101      5-Dec-12   
6 GRTMD101     USA     13        130104    6-Dec-12

So I will need to count number so dates under 'SCRNDT' for each unique 'SITEID' that falls into the 
following 'DT_ETP'

 STUDYID       DT_ETP    TMPT 
1 GRTMD101   9-Nov-12     1       
2 GRTMD101  16-Nov-12    2    
3 GRTMD101  23-Nov-12    3       
4 GRTMD101  30-Nov-12    4    
5 GRTMD101   7-Dec-12     5      
6 GRTMD101  14-Dec-12    6   

For example, for site 13, the 1st and 2nd SCRNDT both are 4-Dec-12 that falls into the week from '30-Nov-12' to '7-Dec-12', then in the result dataset, it should say, 2 under the variable 'week 4', and the result data frame should look like this:

SITEID WEEK1 WEEK2 WEEK3 WEEK4.......WEEK64   (The values are just an example)
     1           0            0          1            2                 3
     2           1            2          3            5                 6
     .             .            .           .             .                 .

Here is a code I modified based on the one you sent me, but quite intuitively, and R returns an empty vector to me, lol, 


 library(plyr)
 res <- ddply(a,.(COUNTRY, SITEID), function(x) {
            x$SCRNDT <-as.Date(x$SCRNDT,"%d-%b-%y")
                               unsplit(lapply(split(b,b$TMPT),function(y) {
        numweek<-as.numeric(length(unique(b$TMPT)))
        for (i in 1:numweek) {
                                             y$DT_ETP <- as.Date(y$DT_ETP, "%d-%b-%y")
                                             sum(x$SCRNDT > y$DT_ETP[i] & x$SCRNDT <= y$DT_ETP[i+1])}}), 
             b$TMPT)})




I really hope you can teach me on this one again! Thank you so much!!

W





2013/10/15 arun <smartpink111 at yahoo.com>

HI Weijia,
>No problem.
>Let me know if it works.  One slight modification as I noticed that your weeks are not overlapping
>#change
>
>
>
>sum(x$Date > y$from & x$Date <= y$to)
>
>#to
>sum(x$Date >= y$from & x$Date <= y$to
>
>Regards,
>Arun
>
>
>
>
>
>
>On Tuesday, October 15, 2013 8:14 PM, Weijia wang <zeleehom at gmail.com> wrote:
>Thank you Arun! I will try it and get back to you! You are amazing!
>
>Weijia Wang
>
>
>> On Oct 15, 2013, at 8:00 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>> Hi,
>>
>> Please use ?dput() to show the dataset.  Also, it is not clear about how you store the time interval.
>> dat <- read.table(text="
>> GroupID        Date
>> 1      1      10-Dec-12
>> 2      1      11-Dec-12
>> 3      2      13-Dec-12
>> 4      2      15-Dec-12
>> 5      3      06-Dec-12
>> 6      3      19-Dec-12",sep="",header=TRUE,stringsAsFactors=FALSE)
>>
>>
>>
>>  dat2 <- data.frame(Week=1:2, from= c("9-Dec-12", "16-Dec-12"), to=c("15-Dec-12","22-Dec-12"),stringsAsFactors=FALSE)
>>
>> #Check ?findInterval()
>>
>> res <- t(sapply(split(dat,dat$GroupID), function(x) {
>>                             x$Date <-as.Date(x$Date,"%d-%b-%y")
>>                              unsplit(lapply(split(dat2,dat2$Week),function(y) {
>>                                         y$from <- as.Date(y$from, "%d-%b-%y")
>>                                          y$to <- as.Date(y$to, "%d-%b-%y")
>>                                          sum(x$Date > y$from & x$Date <= y$to)}),
>>                                                     dat2$Week)
>>                           }))
>>
>> colnames(res) <- paste0("Week",1:2)
>>  res
>> #  Week1 Week2
>> #1     2     0
>> #2     2     0
>> #3     0     1
>>
>>
>> A.K.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tuesday, October 15, 2013 6:24 PM, Weijia Wang <zeleehom at gmail.com> wrote:
>> Hi, I have something very interesting:
>>
>> Say I have this:
>>
>> GroupID         Date
>> 1      1       10-Dec-12
>> 2      1       11-Dec-12
>> 3      2       13-Dec-12
>> 4      2       15-Dec-12
>> 5      3       06-Dec-12
>> 6      3       19-Dec-12
>>
>> Now, I have time interval,
>>
>> week 1: from 9-Dec-12 to 15-Dec-12,
>>
>> week 2: from 16-Dec-12 to 22-Dec-12, and so on.
>>
>> Obviously, the 1st, 2nd, 3rd, 4th row falls to week 1, 5th rows should not
>> be counted, 6th row falls into week2.
>>
>> Therefore, by GroupID, I will have
>>
>> GroupID=1, Week1=2, Week2=0
>> GroupID=2, Week1=2, Week2=0
>> GroupID=3, Week1=0, Week2=1.
>>
>> I just want to count the valid date that falls into a 7-day week interval,
>> and I shall have new variables for EACH WEEK, and the counts for dates that
>> fall into this week interval.
>>
>> Can anyone please help me on programming this?
>>
>> W
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>


-- 

--Sent via Gmail

Weijia Wang
Division of Epidemiology

Department of Population Health, School of Medicine
New York University, NY, 10016



More information about the R-help mailing list