[R] Comparing dates in dataframes

James Rome jamesrome at gmail.com
Mon Jan 18 00:47:40 CET 2010


Here are some sample data sets.

I also tried making a combined field in each set such as
adq=paste(as.character(arr$Date), as.character(arr$quarter))
and similarly for the weather set, so I have unique single things to
compare, but that did not seem to help much.

Thanks,
Jim

On 1/17/10 5:50 PM, David Winsemius wrote:
> My guess (since we still have no data on which to test these ideas) 
> is that you need either to merge() or to use a matrix created from the
> dates and qtr-hours entries in "gw", since matching on dates and hours
> separately will not uniquely classify the good qtr-hours within their
> proper corresponding dates. You want a structure (or a matching
> process) that takes:
>     hqhr1    qhr2    qhr3    qhr4 .......
> date1    good    bad    good    bad
> date2    bad    good    good    good
> date3    bad    bad    bad    good
> .
> .
> .
> and lets you use the values in "arr" to get values in "gw". Notice
> that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr
> simply will not accomplish anything correct/
>
> Merging by multiple criteria (with the merge function) would do that
> or you could construct a matrix whose entries were the categories good
> /bad. The table function could create the matrix for the purpose of
> using an indexed solution if you are dead-set against the merge concept.
>
>
>
>
> On Jan 17, 2010, at 4:47 PM, James Rome wrote:
>
>> Thank you Dennis.
>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>> weather$quarter)
>> seems to be what I want to do, but in fact, with the full data set, it
>> misidentifies the rows, so I think the error message must mean
>> something.
>>
>>> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
>>> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
>>> gw = c(length(arrr))
>>> gw[1:length(arrr[,1])]=FALSE
>>> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]
>> Warning in `==.default`(arr$Date, weather$Date) :
>>  longer object length is not a multiple of shorter object length
>> Warning in arr$Date == weather$Date & weather$quarter %in% arr$quarter :
>>  longer object length is not a multiple of shorter object length
>>  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0
>> [260] 0 0 0 0 0 0 0 0
>>
>> There are many many more matches in the 99k line arrival data set.
>>
>> Thanks a bunch,
>> Jim
>>
>>
>> On 1/17/10 3:21 PM, Dennis Murphy wrote:
>>> Hi:
>>>
>>> To read a data set from a R-help message into R, one uses
>>> read.table(textConnection("<verbatim text>"), ...)
>>>
>>> Your weather data set had
>>> (a) a variable name with a space in it, that R misread and had to be
>>> altered manually;
>>> (b) a missing value with no NA that R interpreted as an incomplete
>>> line; again, it had
>>>     to be altered manually.
>>>
>>> This is why David suggested the use of dput(), so that these vagaries
>>> don't have to be
>>> dealt with by those who are trying to help.
>>>
>>> That being said, for the example that you gave and the desired value
>>> that you wanted, try
>>>
>>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>>> weather$quarter)
>>>
>>> (I changed DateTime to Date in the arr data frame...)
>>>
>>> You'll get warnings like
>>>
>>> Warning messages:
>>> 1: In is.na <http://is.na>(e1) | is.na <http://is.na>(e2) :
>>>  longer object length is not a multiple of shorter object length
>>>
>>> but it seems to do the right thing. The first equality is there to
>>> constrain matches for
>>> quarter to be within the same day.
>>>
>>> For future reference,
>>>
>>>> dput(weather)
>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label = "1/1/09",
>>> class = "factor"),
>>>    minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L
>>>    ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names =
>>> c("Date",
>>> "minute", "hour", "quarter", "efficiency"), class = "data.frame",
>>> row.names = c(NA,
>>> -4L))
>>>> dput(arr)
>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09",
>>> class = "factor"),
>>>    weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
>>>    5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L,
>>>    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
>>>    quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
>>>    60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO =
>>> structure(c(6L,
>>>    8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, 6L, 6L,
>>>    2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", "CJC",
>>>    "COA", "JBU", "NWA"), class = "factor"), Flight = structure(c(15L,
>>>    19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, 10L,
>>>    14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", "AWE307", "BTA1234",
>>>    "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072",
>>>    "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", "COA349",
>>>    "COA855", "COA886", "JBU554", "NWA9934"), class = "factor"),
>>>    gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
>>>    TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,
>>>    FALSE)), .Names = c("Date", "weekday", "month", "quarter",
>>> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class = "data.frame")
>>>
>>> These can be copied and pasted directly into an R session without
>>> modification.
>>>
>>> HTH,
>>> Dennis
>>>
>>> On Sun, Jan 17, 2010 at 10:51 AM, James Rome <jamesrome at gmail.com
>>> <mailto:jamesrome at gmail.com>> wrote:
>>>
>>>
>>>
>>>
>>>    On 1/17/10 1:06 PM, David Winsemius wrote:
>>>>
>>>> On Jan 17, 2010, at 12:37 PM, James Rome wrote:
>>>>
>>>>> I don't think it is that simple because it is not a one-to-one
>>>    match. In
>>>>> the arr data frame, there are many arrivals in a quarter hour
>>>    with good
>>>>> weather on a given day. So I need to match the date and the quarter
>>>>> hour.
>>>>>
>>>>> And all of the rows in the weather data frame are times with good
>>>>> weather--unique date + quarter hour. That is why I needed the
>>>    loop. For
>>>>> each date and quarter hour in weather, I want to mark all the
>>>    entries
>>>>> with the corresponding date and weather as TRUE in the arr$gw
>>>    column.
>>>>>
>>>>> I did convert the dates to POSIXlt dates and rewrote my function as
>>>>> gooddates = function(all, good) {
>>>>>  la = length(all)   # All the arrivals
>>>>> lw = length(good)  # The good 15-minute periods
>>>>> for(j in 1:lw) {
>>>>>   d=good$Date[j]
>>>>>   q=good$quarter[j]
>>>>>   all$gw[all$Date==d && all$quarter==q]=TRUE
>>>>
>>>>
>>>> You are attempting a vectorized test and assignment with "&&" which
>>>> seems unlikely to succeed, but even then I am not sure your problems
>>>> would be over. (I'm also guessing that you might not have reported a
>>>> warning.)
>>>
>>>    Why shouldn't the && succeed? You are correct there, because I do
>>> get
>>>    items if I use either part of this and test, when I insert the &&,
>>>    I get
>>>    no hits. And I got no warnings.
>>>>
>>>> Why not merge arr to gw by date and quarter?
>>>    The sets contain different data, and the only thing I want from the
>>>    weather set is the fact that it has an entry for a given date and
>>> time
>>>>
>>>> Answering these questions would be greatly speeded up with a small
>>>> sample dataset. Are you aware of the virtues of the dput function?
>>>>
>>>
>>>    What I want is for a 1 to be in the gw column in the quarter
>>>    60,61,62,63,...
>>>
>>>    For example, here is some data from the good weather set:
>>>    Date    minute  hour    quarter         Efficiency Val
>>>    1/1/09  5       15      60
>>>    1/1/09  15      15      61      72
>>>    1/1/09  30      15      62      63.3
>>>    1/1/09  45      15      63      85.4
>>>
>>>
>>>
>>>    And this is from the arrivals set:
>>>    DateTime        weekday         month   quarter         ICAO
>>>     Flight  gw
>>>
>>>    1/1/09  5       1       59      COA     COA349          0
>>>    1/1/09  5       1       59      NWA     NWA9934         0
>>>    1/1/09  5       1       60      JBU     JBU554          0
>>>    1/1/09  5       1       60      BTA     BTA2347         0
>>>    1/1/09  5       1       60      COA     COA886          0
>>>    1/1/09  5       1       60      BTA     BTA2916         0
>>>    1/1/09  5       1       60      CJC     CJC3225         0
>>>    1/1/09  5       1       60      BTA     BTA2085         0
>>>    1/1/09  5       1       60      BTA     BTA2064         0
>>>    1/1/09  5       1       60      AAL     AAL842          0
>>>    1/1/09  5       1       60      BTA     BTA1234         0
>>>    1/1/09  5       1       60      CJC     CJC3359         0
>>>    1/1/09  5       1       60      BTA     BTA3072         0
>>>    1/1/09  5       1       61      BTA     BTA3086         0
>>>    1/1/09  5       1       61      COA     COA1166         0
>>>    1/1/09  5       1       61      COA     COA855          0
>>>    1/1/09  5       1       61      AWE     AWE307          0
>>>    1/1/09  5       1       66      CHQ     CHQ5312         0
>>>    1/1/09  5       1       67      BTA     BTA2405         0
>>>
>>>
>>>
>>>           [[alternative HTML version deleted]]
>>>
>>>    ______________________________________________
>>>    R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>>>    https://stat.ethz.ch/mailman/listinfo/r-help
>>>    PLEASE do read the posting guide
>>>    http://www.R-project.org/posting-guide.html
>>>    and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>


More information about the R-help mailing list