[R] Assign date according to defined time interval

arun smartpink111 at yahoo.com
Wed Oct 16 20:58:44 CEST 2013


Hi Weijia,
This will give you the rownames of the split variables.

lst1 <- split(a,list(a$COUNTRY,a$SITEID))


 res <- t(sapply(lst1,function(x) {
                x$SCRNDT <- as.Date(x$SCRNDT, "%d-%b-%y")
                unlist(lapply(split(b,b$TMPT),function(y){
                  sum(x$SCRNDT >= y$DT_ETP & x$SCRNDT <= y$DT_END)
                    }))
                 }))
 colnames(res) <- paste0("WEEK",colnames(res))
 res[5:8,1:4]
#      WEEK1 WEEK2 WEEK3 WEEK4
#USA.5     0     0     0     4
#USA.6     0     0     0     1
#USA.8     0     0     0     0
#USA.9     0     0     0     0


A.K.




On , arun <smartpink111 at yahoo.com> wrote:
Yes, Sorry a typo.  I was in a hurry when I sent it.









On Wednesday, October 16, 2013 2:32 PM, Weijia Wang <zeleehom at gmail.com> wrote:

Arun

I think you mean 

 sum(x$SCRNDT>= y$DT_ETP & x$SCRNDT <= y$DT_END)

instead of DT_ETP at the end of this line, right?

W



2013/10/16 arun <smartpink111 at yahoo.com>

HI Weijia,
>
>Please check whether this is what you wanted.
>
>Weijia <- load("/home/arunksa111/Downloads/arun_help.RData" )
>a[sapply(a,is.factor)] <-lapply(a[sapply(a,is.factor)],as.character)
>str(a)
> b[sapply(b,is.factor)] <- lapply(b[sapply(b,is.factor)],as.character)
>str(b)
> b$DT_ETP <- as.Date(b$DT_ETP,"%d-%b-%y")
> b$DT_END <- c(b$DT_ETP[-1]-1, b$DT_ETP[length(b$DT_ETP)]+6)
>res <- do.call(rbind,lapply(split(a,list(a$COUNTRY,a$SITEID),drop=TRUE),function(x){
>x$SCRNDT <- as.Date(x$SCRNDT, "%d-%b-%y")
>do.call(cbind,lapply(split(b,b$TMPT),function(y) {
> sum(x$SCRNDT>= y$DT_ETP & x$SCRNDT <= y$DT_ETP)
>}))
> }))
>
> colnames(res) <- paste0("WEEK",colnames(res))
>
>
>
>A.K.
>
>
>
>
>
>On Wednesday, October 16, 2013 10:48 AM, Weijia Wang <zeleehom at gmail.com> wrote:
>
>Hi, Arun
>
>Here I attached a R object with two dataframes.
>
>The first one is the one I need to count the number of dates that fall into a certain week interval.
>
>For example,
>
>   STUDYID COUNTRY SITEID SUBJID   SCRNDT   
>1 GRTMD101     USA     13        130101    4-Dec-12 
>2 GRTMD101     USA     13        130102    4-Dec-12   
>3 GRTMD101     USA     13        130103    4-Dec-12  
>4 GRTMD101     USA      6         60101      5-Dec-12 
>5 GRTMD101     USA      5         50101      5-Dec-12   
>6 GRTMD101     USA     13        130104    6-Dec-12
>
>So I will need to count number so dates under 'SCRNDT' for each unique 'SITEID' that falls into the
>following 'DT_ETP'
>
> STUDYID       DT_ETP    TMPT
>1 GRTMD101   9-Nov-12     1       
>2 GRTMD101  16-Nov-12    2    
>3 GRTMD101  23-Nov-12    3       
>4 GRTMD101  30-Nov-12    4    
>5 GRTMD101   7-Dec-12     5      
>6 GRTMD101  14-Dec-12    6   
>
>For example, for site 13, the 1st and 2nd SCRNDT both are 4-Dec-12 that falls into the week from '30-Nov-12' to '7-Dec-12', then in the result dataset, it should say, 2 under the variable 'week 4', and the result data frame should look like this:
>
>SITEID WEEK1 WEEK2 WEEK3 WEEK4.......WEEK64   (The values are just an example)
>     1           0            0          1            2                 3
>     2           1            2          3            5                 6
>     .             .            .           .             .                 .
>
>Here is a code I modified based on the one you sent me, but quite intuitively, and R returns an empty vector to me, lol,
>
>
> library(plyr)
> res <- ddply(a,.(COUNTRY, SITEID), function(x) {
>            x$SCRNDT <-as.Date(x$SCRNDT,"%d-%b-%y")
>                               unsplit(lapply(split(b,b$TMPT),function(y) {
>        numweek<-as.numeric(length(unique(b$TMPT)))
>        for (i in 1:numweek) {
>                                             y$DT_ETP <- as.Date(y$DT_ETP, "%d-%b-%y")
>                                             sum(x$SCRNDT > y$DT_ETP[i] & x$SCRNDT <= y$DT_ETP[i+1])}}), 
>             b$TMPT)})
>
>
>
>
>I really hope you can teach me on this one again! Thank you so much!!
>
>W
>
>
>
>
>
>2013/10/15 arun <smartpink111 at yahoo.com>
>
>HI Weijia,
>>No problem.
>>Let me know if it works.  One slight modification as I noticed that your weeks are not overlapping
>>#change
>>
>>
>>
>>sum(x$Date > y$from & x$Date <= y$to)
>>
>>#to
>>sum(x$Date >= y$from & x$Date <= y$to
>>
>>Regards,
>>Arun
>>
>>
>>
>>
>>
>>
>>On Tuesday, October 15, 2013 8:14 PM, Weijia wang <zeleehom at gmail.com> wrote:
>>Thank you Arun! I will try it and get back to you! You are amazing!
>>
>>Weijia Wang
>>
>>
>>> On Oct 15, 2013, at 8:00 PM, arun <smartpink111 at yahoo.com> wrote:
>>>
>>> Hi,
>>>
>>> Please use ?dput() to show the dataset.  Also, it is not clear about how you store the time interval.
>>> dat <- read.table(text="
>>> GroupID        Date
>>> 1      1      10-Dec-12
>>> 2      1      11-Dec-12
>>> 3      2      13-Dec-12
>>> 4      2      15-Dec-12
>>> 5      3      06-Dec-12
>>> 6      3      19-Dec-12",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>
>>>
>>>
>>>  dat2 <- data.frame(Week=1:2, from= c("9-Dec-12", "16-Dec-12"), to=c("15-Dec-12","22-Dec-12"),stringsAsFactors=FALSE)
>>>
>>> #Check ?findInterval()
>>>
>>> res <- t(sapply(split(dat,dat$GroupID), function(x) {
>>>                             x$Date <-as.Date(x$Date,"%d-%b-%y")
>>>                              unsplit(lapply(split(dat2,dat2$Week),function(y) {
>>>                                         y$from <- as.Date(y$from, "%d-%b-%y")
>>>                                          y$to <- as.Date(y$to, "%d-%b-%y")
>>>                                          sum(x$Date > y$from & x$Date <= y$to)}),
>>>                                                     dat2$Week)
>>>                           }))
>>>
>>> colnames(res) <- paste0("Week",1:2)
>>>  res
>>> #  Week1 Week2
>>> #1     2     0
>>> #2     2     0
>>> #3     0     1
>>>
>>>
>>> A.K.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tuesday, October 15, 2013 6:24 PM, Weijia Wang <zeleehom at gmail.com> wrote:
>>> Hi, I have something very interesting:
>>>
>>> Say I have this:
>>>
>>> GroupID         Date
>>> 1      1       10-Dec-12
>>> 2      1       11-Dec-12
>>> 3      2       13-Dec-12
>>> 4      2       15-Dec-12
>>> 5      3       06-Dec-12
>>> 6      3       19-Dec-12
>>>
>>> Now, I have time interval,
>>>
>>> week 1: from 9-Dec-12 to 15-Dec-12,
>>>
>>> week 2: from 16-Dec-12 to 22-Dec-12, and so on.
>>>
>>> Obviously, the 1st, 2nd, 3rd, 4th row falls to week 1, 5th rows should not
>>> be counted, 6th row falls into week2.
>>>
>>> Therefore, by GroupID, I will have
>>>
>>> GroupID=1, Week1=2, Week2=0
>>> GroupID=2, Week1=2, Week2=0
>>> GroupID=3, Week1=0, Week2=1.
>>>
>>> I just want to count the valid date that falls into a 7-day week interval,
>>> and I shall have new variables for EACH WEEK, and the counts for dates that
>>> fall into this week interval.
>>>
>>> Can anyone please help me on programming this?
>>>
>>> W
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>
>--
>
>--Sent via Gmail
>
>Weijia Wang
>Division of Epidemiology
>
>Department of Population Health, School of Medicine
>New York University, NY, 10016
>


-- 

--Sent via Gmail

Weijia Wang
Division of Epidemiology

Department of Population Health, School of Medicine
New York University, NY, 10016



More information about the R-help mailing list