[R] Create new records based on event dates in a data frame

arun smartpink111 at yahoo.com
Thu Aug 15 16:29:13 CEST 2013


You might also try (could be faster):
library(data.table)
 dt1<- data.table(df,key=c("case","obsdate"))
dt1[,date_end:=obsdate-1]
 dt1[,date_end:=c(date_end[-1],as.Date("2011-03-31")),by=case]
dt1<-subset(dt1,select=c(1,2,4,3))
 dt1
#   case    obsdate   date_end score
#1:    a 2001-04-01 2007-05-19    60
#2:    a 2007-05-20 2010-10-07    72
#3:    a 2010-10-08 2011-03-31    85
#4:    b 2001-04-01 2005-11-09    72
#5:    b 2005-11-10 2011-03-31    79
#6:    c 2001-04-01 2011-03-31    65
 identical(df1,as.data.frame(dt1))
#[1] TRUE


A.K.



----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Gavin Rudge <g.rudge at bham.ac.uk>
Cc: R help <r-help at r-project.org>
Sent: Thursday, August 15, 2013 10:10 AM
Subject: Re: [R] Create new records based on event dates in a data frame

Hi,One way would be:
df<- data.frame(case,obsdate=as.Date(obsdate,format="%d/%m/%Y"),score,stringsAsFactors=FALSE)  #using as.data.frame(cbind(...  should be avoided

df$date_end<-as.Date(unlist(lapply(with(df,tapply(obsdate,case,FUN=function(x) x-1)),function(x) c(x[-1],as.Date("31/03/2011",format="%d/%m/%Y")))),origin="1970-01-01")
 df1<- df[,c(1,2,4,3)]
 df1
#  case    obsdate   date_end score
#1    a 2001-04-01 2007-05-19    60
#2    a 2007-05-20 2010-10-07    72
#3    a 2010-10-08 2011-03-31    85
#4    b 2001-04-01 2005-11-09    72
#5    b 2005-11-10 2011-03-31    79
#6    c 2001-04-01 2011-03-31    65
A.K.



----- Original Message -----
From: Gavin Rudge <g.rudge at bham.ac.uk>
To: "'r-help at r-project.org'" <r-help at r-project.org>
Cc: 
Sent: Thursday, August 15, 2013 9:22 AM
Subject: [R] Create new records based on event dates in a data frame

One of those simple tasks, but I can't get to first base with it.  I've got a data set of observations of subjects over a 10 year period beginning on 1st April 2001 and ending on 31st March 2011.  One of may variables is a score based on an intervention on a given date. Before the intervention there is baseline score, on the day of intervention the score changes as a result.  If there is no observation the baseline score remains constant over the entire period.

Now I have a data.frame with subject ids, the baseline score with a baseline date of 01/04/2001, a baseline score , and the date of any subsequent observations and the scores resulting from them.

Here is a rough approximation with just three subjects, one with two interventions one with one, and one with none.  My actual data set has about 30,000 observations most of them with one or two interventions.

case=c("a","a","a","b","b","c")
obsdate<-c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001")
score=c(60,72,85,72,79,65)
df<-as.data.frame(cbind(case,obsdate,score))
df$obsdate<-as.Date(df$obsdate,format="%d/%m/%Y")
df

Now the data set I am trying to obtain for my analysis will consist of exposure periods for each subject, with the start and end date and the score during the period of exposure. So each subject will have at least one exposure period beginning on the start date and a score.  In those cases where there has been an intervention (most of them) the next exposure period will start on the day of intervention, and the earlier period will end the day before. If there are no subsequent interventions between the start of one and the end of the study period, 31/03/2011, then the last exposure period is censored at this date. Where there is no intervention at all, (case 'c' is an example) the exposure period is the duration of the study, from 01/04/2001 to 31/03/2011.

So for the above example my resulting data frame should look like this:

exp_case=c("a","a","a","b","b","c")
date_begin=c("01/04/2001","20/05/2007","08/10/2010","01/04/2001","10/11/2005","01/04/2001")
date_end=c("19/05/2001","07/10/2010","31/03/2011","09/11/2005","31/03/2011","31/03/2011")
exp_score=c(60,72,85,72,79,65)
expdata<-as.data.frame(cbind(exp_case,date_begin,date_end,exp_score))
expdata$date_begin<-as.Date(expdata$date_begin,format="%d/%m/%Y")
expdata$date_end<-as.Date(expdata$date_end,format="%d/%m/%Y")
expdata

Sorry about the clunky way I've handled the dates, this is the only way I know how to do this. 

All assistance gratefully received

GavinR 

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list