[R] Assign date according to defined time interval
arun
smartpink111 at yahoo.com
Wed Oct 16 19:48:11 CEST 2013
HI Weijia,
Please check whether this is what you wanted.
Weijia <- load("/home/arunksa111/Downloads/arun_help.RData" )
a[sapply(a,is.factor)] <-lapply(a[sapply(a,is.factor)],as.character)
str(a)
b[sapply(b,is.factor)] <- lapply(b[sapply(b,is.factor)],as.character)
str(b)
b$DT_ETP <- as.Date(b$DT_ETP,"%d-%b-%y")
b$DT_END <- c(b$DT_ETP[-1]-1, b$DT_ETP[length(b$DT_ETP)]+6)
res <- do.call(rbind,lapply(split(a,list(a$COUNTRY,a$SITEID),drop=TRUE),function(x){
x$SCRNDT <- as.Date(x$SCRNDT, "%d-%b-%y")
do.call(cbind,lapply(split(b,b$TMPT),function(y) {
sum(x$SCRNDT>= y$DT_ETP & x$SCRNDT <= y$DT_ETP)
}))
}))
colnames(res) <- paste0("WEEK",colnames(res))
A.K.
On Wednesday, October 16, 2013 10:48 AM, Weijia Wang <zeleehom at gmail.com> wrote:
Hi, Arun
Here I attached a R object with two dataframes.
The first one is the one I need to count the number of dates that fall into a certain week interval.
For example,
STUDYID COUNTRY SITEID SUBJID SCRNDT
1 GRTMD101 USA 13 130101 4-Dec-12
2 GRTMD101 USA 13 130102 4-Dec-12
3 GRTMD101 USA 13 130103 4-Dec-12
4 GRTMD101 USA 6 60101 5-Dec-12
5 GRTMD101 USA 5 50101 5-Dec-12
6 GRTMD101 USA 13 130104 6-Dec-12
So I will need to count number so dates under 'SCRNDT' for each unique 'SITEID' that falls into the
following 'DT_ETP'
STUDYID DT_ETP TMPT
1 GRTMD101 9-Nov-12 1
2 GRTMD101 16-Nov-12 2
3 GRTMD101 23-Nov-12 3
4 GRTMD101 30-Nov-12 4
5 GRTMD101 7-Dec-12 5
6 GRTMD101 14-Dec-12 6
For example, for site 13, the 1st and 2nd SCRNDT both are 4-Dec-12 that falls into the week from '30-Nov-12' to '7-Dec-12', then in the result dataset, it should say, 2 under the variable 'week 4', and the result data frame should look like this:
SITEID WEEK1 WEEK2 WEEK3 WEEK4.......WEEK64 (The values are just an example)
1 0 0 1 2 3
2 1 2 3 5 6
. . . . . .
Here is a code I modified based on the one you sent me, but quite intuitively, and R returns an empty vector to me, lol,
library(plyr)
res <- ddply(a,.(COUNTRY, SITEID), function(x) {
x$SCRNDT <-as.Date(x$SCRNDT,"%d-%b-%y")
unsplit(lapply(split(b,b$TMPT),function(y) {
numweek<-as.numeric(length(unique(b$TMPT)))
for (i in 1:numweek) {
y$DT_ETP <- as.Date(y$DT_ETP, "%d-%b-%y")
sum(x$SCRNDT > y$DT_ETP[i] & x$SCRNDT <= y$DT_ETP[i+1])}}),
b$TMPT)})
I really hope you can teach me on this one again! Thank you so much!!
W
2013/10/15 arun <smartpink111 at yahoo.com>
HI Weijia,
>No problem.
>Let me know if it works. One slight modification as I noticed that your weeks are not overlapping
>#change
>
>
>
>sum(x$Date > y$from & x$Date <= y$to)
>
>#to
>sum(x$Date >= y$from & x$Date <= y$to
>
>Regards,
>Arun
>
>
>
>
>
>
>On Tuesday, October 15, 2013 8:14 PM, Weijia wang <zeleehom at gmail.com> wrote:
>Thank you Arun! I will try it and get back to you! You are amazing!
>
>Weijia Wang
>
>
>> On Oct 15, 2013, at 8:00 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>> Hi,
>>
>> Please use ?dput() to show the dataset. Also, it is not clear about how you store the time interval.
>> dat <- read.table(text="
>> GroupID Date
>> 1 1 10-Dec-12
>> 2 1 11-Dec-12
>> 3 2 13-Dec-12
>> 4 2 15-Dec-12
>> 5 3 06-Dec-12
>> 6 3 19-Dec-12",sep="",header=TRUE,stringsAsFactors=FALSE)
>>
>>
>>
>> dat2 <- data.frame(Week=1:2, from= c("9-Dec-12", "16-Dec-12"), to=c("15-Dec-12","22-Dec-12"),stringsAsFactors=FALSE)
>>
>> #Check ?findInterval()
>>
>> res <- t(sapply(split(dat,dat$GroupID), function(x) {
>> x$Date <-as.Date(x$Date,"%d-%b-%y")
>> unsplit(lapply(split(dat2,dat2$Week),function(y) {
>> y$from <- as.Date(y$from, "%d-%b-%y")
>> y$to <- as.Date(y$to, "%d-%b-%y")
>> sum(x$Date > y$from & x$Date <= y$to)}),
>> dat2$Week)
>> }))
>>
>> colnames(res) <- paste0("Week",1:2)
>> res
>> # Week1 Week2
>> #1 2 0
>> #2 2 0
>> #3 0 1
>>
>>
>> A.K.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tuesday, October 15, 2013 6:24 PM, Weijia Wang <zeleehom at gmail.com> wrote:
>> Hi, I have something very interesting:
>>
>> Say I have this:
>>
>> GroupID Date
>> 1 1 10-Dec-12
>> 2 1 11-Dec-12
>> 3 2 13-Dec-12
>> 4 2 15-Dec-12
>> 5 3 06-Dec-12
>> 6 3 19-Dec-12
>>
>> Now, I have time interval,
>>
>> week 1: from 9-Dec-12 to 15-Dec-12,
>>
>> week 2: from 16-Dec-12 to 22-Dec-12, and so on.
>>
>> Obviously, the 1st, 2nd, 3rd, 4th row falls to week 1, 5th rows should not
>> be counted, 6th row falls into week2.
>>
>> Therefore, by GroupID, I will have
>>
>> GroupID=1, Week1=2, Week2=0
>> GroupID=2, Week1=2, Week2=0
>> GroupID=3, Week1=0, Week2=1.
>>
>> I just want to count the valid date that falls into a 7-day week interval,
>> and I shall have new variables for EACH WEEK, and the counts for dates that
>> fall into this week interval.
>>
>> Can anyone please help me on programming this?
>>
>> W
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
--
--Sent via Gmail
Weijia Wang
Division of Epidemiology
Department of Population Health, School of Medicine
New York University, NY, 10016
More information about the R-help
mailing list