[R] Sub setting multiple ids based on a 2nd data frame
arun
smartpink111 at yahoo.com
Sun Sep 8 07:37:17 CEST 2013
HI Matt,
I changed the dates a little bit to show dates that are outside the range in dataset B.
A<- read.table(text="
ID Date Depth Temp
1 2002-05-12 10 12
1 2003-05-13 10 12
1 2003-05-14 10 12
1 2004-04-15 10 12
2 2002-05-16 10 12
2 2002-12-17 10 12
2 2003-04-18 10 12
2 2002-05-19 10 12
3 2003-05-10 10 12
3 2004-05-21 10 12
3 2004-05-22 10 12
3 2005-05-10 10 12
3 2006-05-24 10 12
",sep="",header=TRUE,stringsAsFactors=FALSE)
B<- read.table(text="
Year Start End
2002 2002-05-10 2002-11-01
2003 2003-05-11 2003-11-02
2004 2004-05-12 2004-11-03
2005 2005-05-13 2005-11-04
2006 2006-05-14 2006-11-05
",sep="",header=TRUE,stringsAsFactors=FALSE)
A$Year<-gsub("-.*","",A$Date)
library(plyr)
AB<-join(A,B,by="Year")
indx<-(as.numeric(as.Date(AB$Start))<= as.numeric(as.Date(AB$Date))) & (as.numeric(as.Date(AB$Date)) <= as.numeric(as.Date(AB$End)))
res<- AB[indx,-c(6,7)]
res
# ID Date Depth Temp Year
#1 1 2002-05-12 10 12 2002
#2 1 2003-05-13 10 12 2003
#3 1 2003-05-14 10 12 2003
#5 2 2002-05-16 10 12 2002
#8 2 2002-05-19 10 12 2002
#10 3 2004-05-21 10 12 2004
#11 3 2004-05-22 10 12 2004
#13 3 2006-05-24 10 12 2006
A.K.
Hi All,
I accidentally posted this in the data.table forum and deleted it to post here.
I have some telemetry data that spans multiple years (2002 - 2013) with
multiple individuals per year. I want to subset the telemetry data to
include only those data points that fall between specific dates which are
provided in a 2nd data frame. The telemetry df is in the form of:
DF "A"
ID Date Depth Temp
1 2002-05-12 10 12
1 2002-05-13 10 12
1 2002-05-14 10 12
1 2002-05-15 10 12
2 2002-05-16 10 12
2 2002-05-17 10 12
2 2002-05-18 10 12
2 2002-05-19 10 12
3 2002-05-20 10 12
3 2002-05-21 10 12
3 2002-05-22 10 12
3 2002-05-23 10 12
3 2002-05-24 10 12
And the df with the dates I want to use to subset is formatted as follows:
DF "B"
Year Start End
2002 2002-05-10 2002-11-01
2003 2003-05-11 2003-11-02
2004 2004-05-12 2004-11-03
2005 2005-05-13 2005-11-04
2006 2006-05-14 2006-11-05
So, I want to say, for each ID in DF A, subset and keep only those data
points collected on a date that fall between the start and end date for the
corresponding year from DF B.
I am unsure if a loop is my best bet, or using plyr (which I am unfamiliar
with). I am relatively new to R, so this seems a bit above my head. Any help
is much appreciated.
Thanks in advance!
More information about the R-help
mailing list