[R] Comparing dates in dataframes
James Rome
jamesrome at gmail.com
Mon Jan 18 00:47:40 CET 2010
Here are some sample data sets.
I also tried making a combined field in each set such as
adq=paste(as.character(arr$Date), as.character(arr$quarter))
and similarly for the weather set, so I have unique single things to
compare, but that did not seem to help much.
Thanks,
Jim
On 1/17/10 5:50 PM, David Winsemius wrote:
> My guess (since we still have no data on which to test these ideas)
> is that you need either to merge() or to use a matrix created from the
> dates and qtr-hours entries in "gw", since matching on dates and hours
> separately will not uniquely classify the good qtr-hours within their
> proper corresponding dates. You want a structure (or a matching
> process) that takes:
> hqhr1 qhr2 qhr3 qhr4 .......
> date1 good bad good bad
> date2 bad good good good
> date3 bad bad bad good
> .
> .
> .
> and lets you use the values in "arr" to get values in "gw". Notice
> that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr
> simply will not accomplish anything correct/
>
> Merging by multiple criteria (with the merge function) would do that
> or you could construct a matrix whose entries were the categories good
> /bad. The table function could create the matrix for the purpose of
> using an indexed solution if you are dead-set against the merge concept.
>
>
>
>
> On Jan 17, 2010, at 4:47 PM, James Rome wrote:
>
>> Thank you Dennis.
>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>> weather$quarter)
>> seems to be what I want to do, but in fact, with the full data set, it
>> misidentifies the rows, so I think the error message must mean
>> something.
>>
>>> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
>>> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
>>> gw = c(length(arrr))
>>> gw[1:length(arrr[,1])]=FALSE
>>> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]
>> Warning in `==.default`(arr$Date, weather$Date) :
>> longer object length is not a multiple of shorter object length
>> Warning in arr$Date == weather$Date & weather$quarter %in% arr$quarter :
>> longer object length is not a multiple of shorter object length
>> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [260] 0 0 0 0 0 0 0 0
>>
>> There are many many more matches in the 99k line arrival data set.
>>
>> Thanks a bunch,
>> Jim
>>
>>
>> On 1/17/10 3:21 PM, Dennis Murphy wrote:
>>> Hi:
>>>
>>> To read a data set from a R-help message into R, one uses
>>> read.table(textConnection("<verbatim text>"), ...)
>>>
>>> Your weather data set had
>>> (a) a variable name with a space in it, that R misread and had to be
>>> altered manually;
>>> (b) a missing value with no NA that R interpreted as an incomplete
>>> line; again, it had
>>> to be altered manually.
>>>
>>> This is why David suggested the use of dput(), so that these vagaries
>>> don't have to be
>>> dealt with by those who are trying to help.
>>>
>>> That being said, for the example that you gave and the desired value
>>> that you wanted, try
>>>
>>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>>> weather$quarter)
>>>
>>> (I changed DateTime to Date in the arr data frame...)
>>>
>>> You'll get warnings like
>>>
>>> Warning messages:
>>> 1: In is.na <http://is.na>(e1) | is.na <http://is.na>(e2) :
>>> longer object length is not a multiple of shorter object length
>>>
>>> but it seems to do the right thing. The first equality is there to
>>> constrain matches for
>>> quarter to be within the same day.
>>>
>>> For future reference,
>>>
>>>> dput(weather)
>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label = "1/1/09",
>>> class = "factor"),
>>> minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L
>>> ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names =
>>> c("Date",
>>> "minute", "hour", "quarter", "efficiency"), class = "data.frame",
>>> row.names = c(NA,
>>> -4L))
>>>> dput(arr)
>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09",
>>> class = "factor"),
>>> weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
>>> 5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
>>> quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
>>> 60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO =
>>> structure(c(6L,
>>> 8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, 6L, 6L,
>>> 2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", "CJC",
>>> "COA", "JBU", "NWA"), class = "factor"), Flight = structure(c(15L,
>>> 19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, 10L,
>>> 14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", "AWE307", "BTA1234",
>>> "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072",
>>> "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", "COA349",
>>> "COA855", "COA886", "JBU554", "NWA9934"), class = "factor"),
>>> gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
>>> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,
>>> FALSE)), .Names = c("Date", "weekday", "month", "quarter",
>>> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class = "data.frame")
>>>
>>> These can be copied and pasted directly into an R session without
>>> modification.
>>>
>>> HTH,
>>> Dennis
>>>
>>> On Sun, Jan 17, 2010 at 10:51 AM, James Rome <jamesrome at gmail.com
>>> <mailto:jamesrome at gmail.com>> wrote:
>>>
>>>
>>>
>>>
>>> On 1/17/10 1:06 PM, David Winsemius wrote:
>>>>
>>>> On Jan 17, 2010, at 12:37 PM, James Rome wrote:
>>>>
>>>>> I don't think it is that simple because it is not a one-to-one
>>> match. In
>>>>> the arr data frame, there are many arrivals in a quarter hour
>>> with good
>>>>> weather on a given day. So I need to match the date and the quarter
>>>>> hour.
>>>>>
>>>>> And all of the rows in the weather data frame are times with good
>>>>> weather--unique date + quarter hour. That is why I needed the
>>> loop. For
>>>>> each date and quarter hour in weather, I want to mark all the
>>> entries
>>>>> with the corresponding date and weather as TRUE in the arr$gw
>>> column.
>>>>>
>>>>> I did convert the dates to POSIXlt dates and rewrote my function as
>>>>> gooddates = function(all, good) {
>>>>> la = length(all) # All the arrivals
>>>>> lw = length(good) # The good 15-minute periods
>>>>> for(j in 1:lw) {
>>>>> d=good$Date[j]
>>>>> q=good$quarter[j]
>>>>> all$gw[all$Date==d && all$quarter==q]=TRUE
>>>>
>>>>
>>>> You are attempting a vectorized test and assignment with "&&" which
>>>> seems unlikely to succeed, but even then I am not sure your problems
>>>> would be over. (I'm also guessing that you might not have reported a
>>>> warning.)
>>>
>>> Why shouldn't the && succeed? You are correct there, because I do
>>> get
>>> items if I use either part of this and test, when I insert the &&,
>>> I get
>>> no hits. And I got no warnings.
>>>>
>>>> Why not merge arr to gw by date and quarter?
>>> The sets contain different data, and the only thing I want from the
>>> weather set is the fact that it has an entry for a given date and
>>> time
>>>>
>>>> Answering these questions would be greatly speeded up with a small
>>>> sample dataset. Are you aware of the virtues of the dput function?
>>>>
>>>
>>> What I want is for a 1 to be in the gw column in the quarter
>>> 60,61,62,63,...
>>>
>>> For example, here is some data from the good weather set:
>>> Date minute hour quarter Efficiency Val
>>> 1/1/09 5 15 60
>>> 1/1/09 15 15 61 72
>>> 1/1/09 30 15 62 63.3
>>> 1/1/09 45 15 63 85.4
>>>
>>>
>>>
>>> And this is from the arrivals set:
>>> DateTime weekday month quarter ICAO
>>> Flight gw
>>>
>>> 1/1/09 5 1 59 COA COA349 0
>>> 1/1/09 5 1 59 NWA NWA9934 0
>>> 1/1/09 5 1 60 JBU JBU554 0
>>> 1/1/09 5 1 60 BTA BTA2347 0
>>> 1/1/09 5 1 60 COA COA886 0
>>> 1/1/09 5 1 60 BTA BTA2916 0
>>> 1/1/09 5 1 60 CJC CJC3225 0
>>> 1/1/09 5 1 60 BTA BTA2085 0
>>> 1/1/09 5 1 60 BTA BTA2064 0
>>> 1/1/09 5 1 60 AAL AAL842 0
>>> 1/1/09 5 1 60 BTA BTA1234 0
>>> 1/1/09 5 1 60 CJC CJC3359 0
>>> 1/1/09 5 1 60 BTA BTA3072 0
>>> 1/1/09 5 1 61 BTA BTA3086 0
>>> 1/1/09 5 1 61 COA COA1166 0
>>> 1/1/09 5 1 61 COA COA855 0
>>> 1/1/09 5 1 61 AWE AWE307 0
>>> 1/1/09 5 1 66 CHQ CHQ5312 0
>>> 1/1/09 5 1 67 BTA BTA2405 0
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
More information about the R-help
mailing list