[R] sub-setting rows based on dates in R
Jim Lemon
drjimlemon at gmail.com
Thu Feb 2 23:04:30 CET 2017
It looks to me as though the problem is in your input data. The following
example works for me:
df1<-data.frame(Date=paste(rep(6:8,each=20),
c(sort(sample(1:30,20)),sort(sample(1:30,20)),sort(sample(1:30,20))),
2016,sep="/"),
Rainfall_Duration=sample(c(10,20,30,40),60,TRUE))
df1$Date<-as.Date(df1$Date,"%m/%d/%Y")
df2<-data.frame(Date=paste(rep(6:8,each=10),
c(sort(sample(10:30,10)),sort(sample(10:30,10)),sort(sample(10:30,10))),
2016,sep="/"),
Removal.Rate=runif(30,10,60))
df2$Date<-as.Date(df2$Date,"%m/%d/%Y")
df3<-data.frame(Rate.Removal.Date=NULL,Date=NULL,Rainfall_Duration=NULL)
df3row<-0
for(i in 1:dim(df2)[1]) {
rdrows<-which(df2$Date[i] >= df1$Date & !(df2$Date[i] > df1$Date + 8))
# if there are no dates in df1 within the prior 7 days
if(!length(rdrows)) {
# first check if at least one date in df1 is less than the df2
# date and is not included in the last set of df1 dates
checkrows<-which(df2$Date[i] >= df1$Date)
# use the last date greater than the maximum in lastrows
if(any(checkrows > lastrows))
rdrows<-max(checkrows[checkrows > lastrows])
# otherwise use the last set
else rdrows<-lastrows
}
# save the current set of dates
lastrows<-rdrows
# get the number of new rows
nrows<-length(rdrows)
for(row in 1:nrows) {
# set the values in each row
df3[row+df3row,1]<-format(df2$Date[i],"%m/%d/%Y")
df3[row+df3row,2]<-format(df1$Date[rdrows[row]],"%m/%d/%Y")
df3[row+df3row,3]<-df1$Rainfall_Duration[rdrows[row]]
}
df3row<-df3row+nrows
}
names(df3)<-c("Rate.Removal.Date","Date","Rainfall_Duration")
df3
Jim
On Fri, Feb 3, 2017 at 4:05 AM, Md Sami Bin Shokrana <samimist at live.com>
wrote:
>
> Hi Jim,
>
> Thanks a lot. As you already mentioned, this is a clunky method. This code
> runs fine with the sample data I provided here. But my real data is much
> larger (more than 50 observations for df1) and this code does not work. I
> am getting this error:
> Error in `*tmp*`[[j]] : subscript out of bounds
> Do you have any solution for that? You have already done a lot. So, I
> really appreciate your effort. Thanks
>
> ------------------------------
> *From:* Jim Lemon <drjimlemon at gmail.com>
> *Sent:* Thursday, February 2, 2017 2:07 PM
>
> *To:* Md Sami Bin Shokrana; r-help mailing list
> *Subject:* Re: [R] sub-setting rows based on dates in R
>
> Hi Md,
> What I have done is to use the most recent intervening date between the
> last set of dates if any are there, otherwise the last set of dates. That
> is what I understand from your description.
>
> Remember that this is a very clunky way to do something like this by
> adding rows to a data frame, and it is likely to scale up to large data
> sets badly.
>
> df1<-read.table(text="Date Rainfall_Duration
> 6/14/2016 10
> 6/15/2016 20
> 6/17/2016 10
> 8/16/2016 30
> 8/19/2016 40
> 8/21/2016 20
> 9/4/2016 10",
> header=TRUE,stringsAsFactors=FALSE)
> # change the character strings in df2$Date to Date values
> df1$Date<-as.Date(df1$Date,"%m/%d/%Y")
>
> df2<-read.table(text="Date Removal.Rate
> 6/17/2016 64.7
> 6/30/2016 22.63
> 7/14/2016 18.18
> 8/19/2016 27.87
> 8/30/2016 23.45
> 9/2/2016 17.2",
> header=TRUE,stringsAsFactors=FALSE)
> # change the character strings in df2$Date to Date values
> df2$Date<-as.Date(df2$Date,"%m/%d/%Y")
>
> df3<-data.frame(Rate.Removal.Date=NULL,Date=NULL,Rainfall_Duration=NULL)
>
> df3row<-0
>
> for(i in 1:dim(df2)[1]) {
> rdrows<-which(df2$Date[i] >= df1$Date & !(df2$Date[i] > df1$Date + 8))
> # if there are no dates in df1 within the prior 7 days
> if(!length(rdrows)) {
> # first check if at least one date in df1 is less than the df2
> # date and is not included in the last set of df1 dates
> checkrows<-which(df2$Date[i] >= df1$Date)
> # use the last date greater than the maximum in lastrows
> if(any(checkrows > lastrows))
> rdrows<-max(checkrows[checkrows > lastrows])
> # otherwise use the last set
> else rdrows<-lastrows
> }
> # save the current set of dates
> lastrows<-rdrows
> # get the number of new rows
> nrows<-length(rdrows)
> for(row in 1:nrows) {
> # set the values in each row
> df3[row+df3row,1]<-format(df2$Date[i],"%m/%d/%Y")
> df3[row+df3row,2]<-format(df1$Date[rdrows[row]],"%m/%d/%Y")
> df3[row+df3row,3]<-df1$Rainfall_Duration[rdrows[row]]
> }
> # keep count of the current number of rows
> df3row<-df3row+nrows
> }
>
> names(df3)<-c("Rate.Removal.Date","Date","Rainfall_Duration")
> df3
>
> Jim
>
>
> On Thu, Feb 2, 2017 at 4:58 AM, Md Sami Bin Shokrana <samimist at live.com>
> wrote:
>
>> Hi Jim,
>>
>> Thank you so much for your help. Your code works great. Could you please
>> explain your code a bit? One more thing, I am so sorry that I forgot to
>> mention one more criteria in my post. If it is not much trouble, could you
>> please help me out with that? I have added a couple more observations
>> (the bold ones) to each of my data frames which are shown below:
>>
>>
>> The main concept is,
>>
>> (i) For a specific date in df2, if no matching dates are available in df1
>> within the 7 days range, the code will keep on looking for the latest
>> available date in df1 with a "Rainfall_Duration" data. For example, in df2,
>> for *8/30/2016*, there is no "Rainfall_Duration" data available in
>> df1 within the prior 7 days range. So, I want the code to keep on looking
>> for dates in df1 until there is an available data for "Rainfall_Duration"
>> in df1 (in this case which is * 8/21/2016)* .
>>
>>
>> (ii) Additionally, for* 9/2/2016 *(df2), there is no date available in
>> df1 with a "Rainfall_Duration" data within prior 7 days range. The latest
>> available data for "Rainfall_Duration" is *8/21/2016*. So, the code will
>> extract the same result we had for *8/30/2016* in df2.
>>
>> In simpler words, i just want the code to keep on looking for data with
>> "Rainfall_Duraiton" in df1 if there is none available within the prior 7
>> days range. Sorry for not mentioning it before.
>>
>>
>>
>> df1 <-
>>
>> Date Rainfall_Duration
>> 6/14/2016 10
>> 6/15/2016 20
>> 6/17/2016 10
>> 8/16/2016 30
>> 8/19/2016 40
>> *8/21/2016* *20* *9/4/2016 10*
>>
>>
>> df2 <-
>>
>> Date Removal.Rate
>> 6/17/2016 64.7
>> 6/30/2016 22.63
>> 7/14/2016 18.18
>> 8/19/2016 27.87
>> *8/30/2016* *23.45* *9/2/2016 17.2*
>>
>>
>> Expected output:
>>
>>
>> df3 <-
>>
>> Rate.Removal.Date Date Rainfall_Duration
>> 6/17/2016 6/14/2016 10
>> 6/17/2016 6/15/2016 20
>> 6/17/2016 6/17/2016 10
>> 6/30/2016 6/14/2016 10
>> 6/30/2016 6/15/2016 20
>> 6/30/2016 6/17/2016 10
>> 7/14/2016 6/14/2016 10
>> 7/14/2016 6/15/2016 20
>> 7/14/2016 6/17/2016 10
>> 8/19/2016 8/16/2016 30
>> 8/19/2016 8/19/2016 40
>> *8/30/2016* *8/21/2016* *20* *9/2/2016 8/21/2016
>> 20*
>>
>>
>> Thanks in advance.
>>
>>
>> ------------------------------
>> *From:* Jim Lemon <drjimlemon at gmail.com>
>> *Sent:* Wednesday, February 1, 2017 1:18 PM
>> *To:* Md Sami Bin Shokrana; r-help mailing list
>> *Subject:* Re: [R] sub-setting rows based on dates in R
>>
>> Hi Md,
>> This kind of clunky, but it might do what you want.
>>
>> df1<-read.table(text="Date Rainfall_Duration
>> 6/14/2016 10
>> 6/15/2016 20
>> 6/17/2016 10
>> 8/16/2016 30
>> 8/19/2016 40",
>> header=TRUE,stringsAsFactors=FALSE)
>>
>> df1$Date<-as.Date(df1$Date,"%m/%d/%Y")
>>
>> df2<-read.table(text="Date Removal.Rate
>> 6/17/2016 64.7
>> 6/30/2016 22.63
>> 7/14/2016 18.18
>> 8/19/2016 27.87",
>> header=TRUE,stringsAsFactors=FALSE)
>>
>> df2$Date<-as.Date(df2$Date,"%m/%d/%Y")
>>
>> df3<-data.frame(Rate.Removal.Date=NULL,Date=NULL,Rainfall_Duration=NULL)
>>
>> df3row<-0
>>
>> for(i in 1:dim(df2)[1]) {
>> rdrows<-which(df2$Date[i] >= df1$Date & !(df2$Date[i] > df1$Date + 8))
>> if(!length(rdrows)) rdrows<-lastrows
>> lastrows<-rdrows
>> nrows<-length(rdrows)
>> for(row in 1:nrows) {
>> df3[row+df3row,1]<-format(df2$Date[i],"%m/%d/%Y")
>> df3[row+df3row,2]<-format(df1$Date[rdrows[row]],"%m/%d/%Y")
>> df3[row+df3row,3]<-df1$Rainfall_Duration[rdrows[row]]
>> }
>> df3row<-df3row+nrows
>> }
>>
>> names(df3)<-c("Rate.Removal.Date","Date","Rainfall_Duration")
>> df3
>>
>> Jim
>>
>> On Wed, Feb 1, 2017 at 3:48 AM, Md Sami Bin Shokrana <samimist at live.com>
>> wrote:
>> > Hello guys, I am trying to solve a problem in R. I have 2 data frames
>> which look like this:
>> > df1 <-
>> > Date Rainfall_Duration
>> > 6/14/2016 10
>> > 6/15/2016 20
>> > 6/17/2016 10
>> > 8/16/2016 30
>> > 8/19/2016 40
>> >
>> > df2 <-
>> > Date Removal.Rate
>> > 6/17/2016 64.7
>> > 6/30/2016 22.63
>> > 7/14/2016 18.18
>> > 8/19/2016 27.87
>> >
>> > I want to look up the dates from df2 in df1 and their corresponding
>> Rainfall_Duration data. For example, I want to look for the 1st date of df2
>> in df1 and subset rows in df1 for that specific date and 7 days prior to
>> that. additionally, for example: for 6/30/2016 (in df2) there is no dates
>> available in df1 within it's 7 days range. So, in this case I just want to
>> extract the results same as it's previous date (6/17/2016) in df2. Same
>> logic goes for 7/14/2016(df2).
>> > The output should look like this:
>> >
>> > df3<-
>> >
>> > Rate.Removal.Date Date Rainfall_Duration
>> > 6/17/2016 6/14/2016 10
>> > 6/17/2016 6/15/2016 20
>> > 6/17/2016 6/17/2016 10
>> > 6/30/2016 6/14/2016 10
>> > 6/30/2016 6/15/2016 20
>> > 6/30/2016 6/17/2016 10
>> > 7/14/2016 6/14/2016 10
>> > 7/14/2016 6/15/2016 20
>> > 7/14/2016 6/17/2016 10
>> > 8/19/2016 8/16/2016 30
>> > 8/19/2016 8/19/2016 40
>> >
>> > I could subset data for the 7 days range. But could not do it when no
>> dates are available in that range. I have the following code:
>> > library(plyr)
>> > library (dplyr)
>> > df1$Date <- as.Date(df1$Date,format = "%m/%d/%Y")
>> > df2$Date <- as.Date(df2$Date,format = "%m/%d/%Y")
>> >
>> > df3 <- lapply(df2$Date, function(x){
>> > filter(df1, between(Date, x-7, x))
>> > })
>> >
>> > names(df3) <- as.character(df2$Date)
>> > bind_rows(df3, .id = "Rate.Removal.Date")
>> > df3 <- ldply (df3, data.frame, .id = "Rate.Removal.Date")
>> >
>> > I hope I could explain my problem properly. I would highly appreciate
>> if someone can help me out with this code or a new one. Thanks in advance.
>> >
>> >
>> >
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> R-help Info Page - Homepage - SfS – Seminar for Statistics
>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>> stat.ethz.ch
>> The main R mailing list, for announcements about the development of R and
>> the availability of new code, questions and answers about problems and
>> solutions using R ...
>>
>>
>> > PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list