[R] Comparing dates in dataframes

David Winsemius dwinsemius at comcast.net
Sun Jan 17 23:50:56 CET 2010


My guess (since we still have no data on which to test these ideas)   
is that you need either to merge() or to use a matrix created from the  
dates and qtr-hours entries in "gw", since matching on dates and hours  
separately will not uniquely classify the good qtr-hours within their  
proper corresponding dates. You want a structure (or a matching  
process) that takes:
	hqhr1	qhr2	qhr3	qhr4 .......
date1	good	bad	good	bad
date2	bad	good	good	good
date3	bad	bad	bad	good
.
.
.
and lets you use the values in "arr" to get values in "gw". Notice  
that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr  
simply will not accomplish anything correct/

Merging by multiple criteria (with the merge function) would do that  
or you could construct a matrix whose entries were the categories  
good /bad. The table function could create the matrix for the purpose  
of using an indexed solution if you are dead-set against the merge  
concept.




On Jan 17, 2010, at 4:47 PM, James Rome wrote:

> Thank you Dennis.
> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
> weather$quarter)
> seems to be what I want to do, but in fact, with the full data set, it
> misidentifies the rows, so I think the error message must mean  
> something.
>
>> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
>> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
>> gw = c(length(arrr))
>> gw[1:length(arrr[,1])]=FALSE
>> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]
> Warning in `==.default`(arr$Date, weather$Date) :
>  longer object length is not a multiple of shorter object length
> Warning in arr$Date == weather$Date & weather$quarter %in% arr 
> $quarter :
>  longer object length is not a multiple of shorter object length
>  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0
> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0
> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0
> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
> 0 0
> 0 0 0 0
> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
> 0 0
> 0 0 0 0
> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
> 0 0
> 0 0 0 0
> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
> 0 0
> 0 0 0 0
> [260] 0 0 0 0 0 0 0 0
>
> There are many many more matches in the 99k line arrival data set.
>
> Thanks a bunch,
> Jim
>
>
> On 1/17/10 3:21 PM, Dennis Murphy wrote:
>> Hi:
>>
>> To read a data set from a R-help message into R, one uses
>> read.table(textConnection("<verbatim text>"), ...)
>>
>> Your weather data set had
>> (a) a variable name with a space in it, that R misread and had to be
>> altered manually;
>> (b) a missing value with no NA that R interpreted as an incomplete
>> line; again, it had
>>     to be altered manually.
>>
>> This is why David suggested the use of dput(), so that these vagaries
>> don't have to be
>> dealt with by those who are trying to help.
>>
>> That being said, for the example that you gave and the desired value
>> that you wanted, try
>>
>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>> weather$quarter)
>>
>> (I changed DateTime to Date in the arr data frame...)
>>
>> You'll get warnings like
>>
>> Warning messages:
>> 1: In is.na <http://is.na>(e1) | is.na <http://is.na>(e2) :
>>  longer object length is not a multiple of shorter object length
>>
>> but it seems to do the right thing. The first equality is there to
>> constrain matches for
>> quarter to be within the same day.
>>
>> For future reference,
>>
>>> dput(weather)
>> structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label = "1/1/09",
>> class = "factor"),
>>    minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L
>>    ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names =
>> c("Date",
>> "minute", "hour", "quarter", "efficiency"), class = "data.frame",
>> row.names = c(NA,
>> -4L))
>>> dput(arr)
>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09",
>> class = "factor"),
>>    weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
>>    5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L,
>>    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
>>    quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
>>    60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO =
>> structure(c(6L,
>>    8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, 6L, 6L,
>>    2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", "CJC",
>>    "COA", "JBU", "NWA"), class = "factor"), Flight = structure(c(15L,
>>    19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, 10L,
>>    14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", "AWE307", "BTA1234",
>>    "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072",
>>    "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", "COA349",
>>    "COA855", "COA886", "JBU554", "NWA9934"), class = "factor"),
>>    gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
>>    TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,
>>    FALSE)), .Names = c("Date", "weekday", "month", "quarter",
>> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class =  
>> "data.frame")
>>
>> These can be copied and pasted directly into an R session without
>> modification.
>>
>> HTH,
>> Dennis
>>
>> On Sun, Jan 17, 2010 at 10:51 AM, James Rome <jamesrome at gmail.com
>> <mailto:jamesrome at gmail.com>> wrote:
>>
>>
>>
>>
>>    On 1/17/10 1:06 PM, David Winsemius wrote:
>>>
>>> On Jan 17, 2010, at 12:37 PM, James Rome wrote:
>>>
>>>> I don't think it is that simple because it is not a one-to-one
>>    match. In
>>>> the arr data frame, there are many arrivals in a quarter hour
>>    with good
>>>> weather on a given day. So I need to match the date and the quarter
>>>> hour.
>>>>
>>>> And all of the rows in the weather data frame are times with good
>>>> weather--unique date + quarter hour. That is why I needed the
>>    loop. For
>>>> each date and quarter hour in weather, I want to mark all the
>>    entries
>>>> with the corresponding date and weather as TRUE in the arr$gw
>>    column.
>>>>
>>>> I did convert the dates to POSIXlt dates and rewrote my function as
>>>> gooddates = function(all, good) {
>>>>  la = length(all)   # All the arrivals
>>>> lw = length(good)  # The good 15-minute periods
>>>> for(j in 1:lw) {
>>>>   d=good$Date[j]
>>>>   q=good$quarter[j]
>>>>   all$gw[all$Date==d && all$quarter==q]=TRUE
>>>
>>>
>>> You are attempting a vectorized test and assignment with "&&" which
>>> seems unlikely to succeed, but even then I am not sure your problems
>>> would be over. (I'm also guessing that you might not have reported a
>>> warning.)
>>
>>    Why shouldn't the && succeed? You are correct there, because I  
>> do get
>>    items if I use either part of this and test, when I insert the &&,
>>    I get
>>    no hits. And I got no warnings.
>>>
>>> Why not merge arr to gw by date and quarter?
>>    The sets contain different data, and the only thing I want from  
>> the
>>    weather set is the fact that it has an entry for a given date  
>> and time
>>>
>>> Answering these questions would be greatly speeded up with a small
>>> sample dataset. Are you aware of the virtues of the dput function?
>>>
>>
>>    What I want is for a 1 to be in the gw column in the quarter
>>    60,61,62,63,...
>>
>>    For example, here is some data from the good weather set:
>>    Date    minute  hour    quarter         Efficiency Val
>>    1/1/09  5       15      60
>>    1/1/09  15      15      61      72
>>    1/1/09  30      15      62      63.3
>>    1/1/09  45      15      63      85.4
>>
>>
>>
>>    And this is from the arrivals set:
>>    DateTime        weekday         month   quarter         ICAO
>>     Flight  gw
>>
>>    1/1/09  5       1       59      COA     COA349          0
>>    1/1/09  5       1       59      NWA     NWA9934         0
>>    1/1/09  5       1       60      JBU     JBU554          0
>>    1/1/09  5       1       60      BTA     BTA2347         0
>>    1/1/09  5       1       60      COA     COA886          0
>>    1/1/09  5       1       60      BTA     BTA2916         0
>>    1/1/09  5       1       60      CJC     CJC3225         0
>>    1/1/09  5       1       60      BTA     BTA2085         0
>>    1/1/09  5       1       60      BTA     BTA2064         0
>>    1/1/09  5       1       60      AAL     AAL842          0
>>    1/1/09  5       1       60      BTA     BTA1234         0
>>    1/1/09  5       1       60      CJC     CJC3359         0
>>    1/1/09  5       1       60      BTA     BTA3072         0
>>    1/1/09  5       1       61      BTA     BTA3086         0
>>    1/1/09  5       1       61      COA     COA1166         0
>>    1/1/09  5       1       61      COA     COA855          0
>>    1/1/09  5       1       61      AWE     AWE307          0
>>    1/1/09  5       1       66      CHQ     CHQ5312         0
>>    1/1/09  5       1       67      BTA     BTA2405         0
>>
>>
>>
>>           [[alternative HTML version deleted]]
>>
>>    ______________________________________________
>>    R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>>    https://stat.ethz.ch/mailman/listinfo/r-help
>>    PLEASE do read the posting guide
>>    http://www.R-project.org/posting-guide.html
>>    and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list