[R] Comparing dates in dataframes

David Winsemius dwinsemius at comcast.net
Mon Jan 18 01:46:59 CET 2010


But, but, but .... there is no weather goodness variable in  
weather?!?!?!

 > str(weather)
'data.frame':	155 obs. of  4 variables:
  $ Date   :Class 'Date'  num [1:155] 14245 14245 14245 14245 14245 ...
  $ minute : int  5 15 30 45 0 15 30 45 0 15 ...
  $ hour   : int  15 15 15 15 17 17 17 17 18 18 ...
  $ quarter: int  65 75 90 105 68 83 98 113 72 87 ..

I thought you said the "weather" dataframe would have some information  
about "goodness" that we were supposed to map to arrivals.? What is  
the meaning of those variables? How do we define a "good" quarter  
hour? And why are the values of quarter not 1, 2, 3, 4? They ought to  
be a factor or integer that could be matched to those that are in  
"arr", which are also apparently not so defined. Let's see a better  
codebook or description of these variables.

On Jan 17, 2010, at 6:47 PM, James Rome wrote:

> Here are some sample data sets.
>
> I also tried making a combined field in each set such as
> adq=paste(as.character(arr$Date), as.character(arr$quarter))
> and similarly for the weather set, so I have unique single things to
> compare, but that did not seem to help much.
>
> Thanks,
> Jim
>
> On 1/17/10 5:50 PM, David Winsemius wrote:
>> My guess (since we still have no data on which to test these ideas)
>> is that you need either to merge() or to use a matrix created from  
>> the
>> dates and qtr-hours entries in "gw", since matching on dates and  
>> hours
>> separately will not uniquely classify the good qtr-hours within their
>> proper corresponding dates. You want a structure (or a matching
>> process) that takes:
>>    hqhr1    qhr2    qhr3    qhr4 .......
>> date1    good    bad    good    bad
>> date2    bad    good    good    good
>> date3    bad    bad    bad    good
>> .
>> .
>> .
>> and lets you use the values in "arr" to get values in "gw". Notice
>> that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr
>> simply will not accomplish anything correct/
>>
>> Merging by multiple criteria (with the merge function) would do that
>> or you could construct a matrix whose entries were the categories  
>> good
>> /bad. The table function could create the matrix for the purpose of
>> using an indexed solution if you are dead-set against the merge  
>> concept.
>>
>>
>>
>>
>> On Jan 17, 2010, at 4:47 PM, James Rome wrote:
>>
>>> Thank you Dennis.
>>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>>> weather$quarter)
>>> seems to be what I want to do, but in fact, with the full data  
>>> set, it
>>> misidentifies the rows, so I think the error message must mean
>>> something.
>>>
>>>> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
>>>> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/ 
>>>> %y")
>>>> gw = c(length(arrr))
>>>> gw[1:length(arrr[,1])]=FALSE
>>>> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]
>>> Warning in `==.default`(arr$Date, weather$Date) :
>>> longer object length is not a multiple of shorter object length
>>> Warning in arr$Date == weather$Date & weather$quarter %in% arr 
>>> $quarter :
>>> longer object length is not a multiple of shorter object length
>>> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
>>> 0 0
>>> 0 0 0 0
>>> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
>>> 0 0
>>> 0 0 0 0
>>> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
>>> 0 0
>>> 0 0 0 0
>>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
>>> 0 0 0
>>> 0 0 0 0
>>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
>>> 0 0 0
>>> 0 0 0 0
>>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
>>> 0 0 0
>>> 0 0 0 0
>>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
>>> 0 0 0
>>> 0 0 0 0
>>> [260] 0 0 0 0 0 0 0 0
>>>
>>> There are many many more matches in the 99k line arrival data set.
>>>
>>> Thanks a bunch,
>>> Jim
>>>
>>>
>>> On 1/17/10 3:21 PM, Dennis Murphy wrote:
>>>> Hi:
>>>>
>>>> To read a data set from a R-help message into R, one uses
>>>> read.table(textConnection("<verbatim text>"), ...)
>>>>
>>>> Your weather data set had
>>>> (a) a variable name with a space in it, that R misread and had to  
>>>> be
>>>> altered manually;
>>>> (b) a missing value with no NA that R interpreted as an incomplete
>>>> line; again, it had
>>>>    to be altered manually.
>>>>
>>>> This is why David suggested the use of dput(), so that these  
>>>> vagaries
>>>> don't have to be
>>>> dealt with by those who are trying to help.
>>>>
>>>> That being said, for the example that you gave and the desired  
>>>> value
>>>> that you wanted, try
>>>>
>>>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>>>> weather$quarter)
>>>>
>>>> (I changed DateTime to Date in the arr data frame...)
>>>>
>>>> You'll get warnings like
>>>>
>>>> Warning messages:
>>>> 1: In is.na <http://is.na>(e1) | is.na <http://is.na>(e2) :
>>>> longer object length is not a multiple of shorter object length
>>>>
>>>> but it seems to do the right thing. The first equality is there to
>>>> constrain matches for
>>>> quarter to be within the same day.
>>>>
>>>> For future reference,
>>>>
>>>>> dput(weather)
>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label =  
>>>> "1/1/09",
>>>> class = "factor"),
>>>>   minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L
>>>>   ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names =
>>>> c("Date",
>>>> "minute", "hour", "quarter", "efficiency"), class = "data.frame",
>>>> row.names = c(NA,
>>>> -4L))
>>>>> dput(arr)
>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09",
>>>> class = "factor"),
>>>>   weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
>>>>   5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L,
>>>>   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
>>>>   quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
>>>>   60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO =
>>>> structure(c(6L,
>>>>   8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, 6L, 6L,
>>>>   2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", "CJC",
>>>>   "COA", "JBU", "NWA"), class = "factor"), Flight =  
>>>> structure(c(15L,
>>>>   19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, 10L,
>>>>   14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", "AWE307", "BTA1234",
>>>>   "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072",
>>>>   "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", "COA349",
>>>>   "COA855", "COA886", "JBU554", "NWA9934"), class = "factor"),
>>>>   gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
>>>>   TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,
>>>>   FALSE)), .Names = c("Date", "weekday", "month", "quarter",
>>>> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class =  
>>>> "data.frame")
>>>>
>>>> These can be copied and pasted directly into an R session without
>>>> modification.
>>>>
>>>> HTH,
>>>> Dennis
>>>>
>>>> On Sun, Jan 17, 2010 at 10:51 AM, James Rome <jamesrome at gmail.com
>>>> <mailto:jamesrome at gmail.com>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>   On 1/17/10 1:06 PM, David Winsemius wrote:
>>>>>
>>>>> On Jan 17, 2010, at 12:37 PM, James Rome wrote:
>>>>>
>>>>>> I don't think it is that simple because it is not a one-to-one
>>>>   match. In
>>>>>> the arr data frame, there are many arrivals in a quarter hour
>>>>   with good
>>>>>> weather on a given day. So I need to match the date and the  
>>>>>> quarter
>>>>>> hour.
>>>>>>
>>>>>> And all of the rows in the weather data frame are times with good
>>>>>> weather--unique date + quarter hour. That is why I needed the
>>>>   loop. For
>>>>>> each date and quarter hour in weather, I want to mark all the
>>>>   entries
>>>>>> with the corresponding date and weather as TRUE in the arr$gw
>>>>   column.
>>>>>>
>>>>>> I did convert the dates to POSIXlt dates and rewrote my  
>>>>>> function as
>>>>>> gooddates = function(all, good) {
>>>>>> la = length(all)   # All the arrivals
>>>>>> lw = length(good)  # The good 15-minute periods
>>>>>> for(j in 1:lw) {
>>>>>>  d=good$Date[j]
>>>>>>  q=good$quarter[j]
>>>>>>  all$gw[all$Date==d && all$quarter==q]=TRUE
>>>>>
>>>>>
>>>>> You are attempting a vectorized test and assignment with "&&"  
>>>>> which
>>>>> seems unlikely to succeed, but even then I am not sure your  
>>>>> problems
>>>>> would be over. (I'm also guessing that you might not have  
>>>>> reported a
>>>>> warning.)
>>>>
>>>>   Why shouldn't the && succeed? You are correct there, because I do
>>>> get
>>>>   items if I use either part of this and test, when I insert the  
>>>> &&,
>>>>   I get
>>>>   no hits. And I got no warnings.
>>>>>
>>>>> Why not merge arr to gw by date and quarter?
>>>>   The sets contain different data, and the only thing I want from  
>>>> the
>>>>   weather set is the fact that it has an entry for a given date and
>>>> time
>>>>>
>>>>> Answering these questions would be greatly speeded up with a small
>>>>> sample dataset. Are you aware of the virtues of the dput function?
>>>>>
>>>>
>>>>   What I want is for a 1 to be in the gw column in the quarter
>>>>   60,61,62,63,...
>>>>
>>>>   For example, here is some data from the good weather set:
>>>>   Date    minute  hour    quarter         Efficiency Val
>>>>   1/1/09  5       15      60
>>>>   1/1/09  15      15      61      72
>>>>   1/1/09  30      15      62      63.3
>>>>   1/1/09  45      15      63      85.4
>>>>
>>>>
>>>>
>>>>   And this is from the arrivals set:
>>>>   DateTime        weekday         month   quarter         ICAO
>>>>    Flight  gw
>>>>
>>>>   1/1/09  5       1       59      COA     COA349          0
>>>>   1/1/09  5       1       59      NWA     NWA9934         0
>>>>   1/1/09  5       1       60      JBU     JBU554          0
>>>>   1/1/09  5       1       60      BTA     BTA2347         0
>>>>   1/1/09  5       1       60      COA     COA886          0
>>>>   1/1/09  5       1       60      BTA     BTA2916         0
>>>>   1/1/09  5       1       60      CJC     CJC3225         0
>>>>   1/1/09  5       1       60      BTA     BTA2085         0
>>>>   1/1/09  5       1       60      BTA     BTA2064         0
>>>>   1/1/09  5       1       60      AAL     AAL842          0
>>>>   1/1/09  5       1       60      BTA     BTA1234         0
>>>>   1/1/09  5       1       60      CJC     CJC3359         0
>>>>   1/1/09  5       1       60      BTA     BTA3072         0
>>>>   1/1/09  5       1       61      BTA     BTA3086         0
>>>>   1/1/09  5       1       61      COA     COA1166         0
>>>>   1/1/09  5       1       61      COA     COA855          0
>>>>   1/1/09  5       1       61      AWE     AWE307          0
>>>>   1/1/09  5       1       66      CHQ     CHQ5312         0
>>>>   1/1/09  5       1       67      BTA     BTA2405         0
>>>>
>>>>
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>>   ______________________________________________
>>>>   R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>>>>   https://stat.ethz.ch/mailman/listinfo/r-help
>>>>   PLEASE do read the posting guide
>>>>   http://www.R-project.org/posting-guide.html
>>>>   and provide commented, minimal, self-contained, reproducible  
>>>> code.
>>>>
>>>>
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
> <arr.rda><weather.rda>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list