[R] Comparing dates in dataframes
James Rome
jamesrome at gmail.com
Mon Jan 18 02:17:45 CET 2010
Any entry in the weather data is a good day. That is the point. And
please ignore my mistake about the quarters getting too large in
weather. I am being swamped with versions, and it does not matter for
this purpose.. so, the bad weather days are not in the weather data set.
I am trying to get gw=1 in arr if the date and quarter are in weather.
Thanks,
Jim
On 1/17/10 7:46 PM, David Winsemius wrote:
> But, but, but .... there is no weather goodness variable in weather?!?!?!
>
> > str(weather)
> 'data.frame': 155 obs. of 4 variables:
> $ Date :Class 'Date' num [1:155] 14245 14245 14245 14245 14245 ...
> $ minute : int 5 15 30 45 0 15 30 45 0 15 ...
> $ hour : int 15 15 15 15 17 17 17 17 18 18 ...
> $ quarter: int 65 75 90 105 68 83 98 113 72 87 ..
>
> I thought you said the "weather" dataframe would have some information
> about "goodness" that we were supposed to map to arrivals.? What is
> the meaning of those variables? How do we define a "good" quarter
> hour? And why are the values of quarter not 1, 2, 3, 4? They ought to
> be a factor or integer that could be matched to those that are in
> "arr", which are also apparently not so defined. Let's see a better
> codebook or description of these variables.
>
> On Jan 17, 2010, at 6:47 PM, James Rome wrote:
>
>> Here are some sample data sets.
>>
>> I also tried making a combined field in each set such as
>> adq=paste(as.character(arr$Date), as.character(arr$quarter))
>> and similarly for the weather set, so I have unique single things to
>> compare, but that did not seem to help much.
>>
>> Thanks,
>> Jim
>>
>> On 1/17/10 5:50 PM, David Winsemius wrote:
>>> My guess (since we still have no data on which to test these ideas)
>>> is that you need either to merge() or to use a matrix created from the
>>> dates and qtr-hours entries in "gw", since matching on dates and hours
>>> separately will not uniquely classify the good qtr-hours within their
>>> proper corresponding dates. You want a structure (or a matching
>>> process) that takes:
>>> hqhr1 qhr2 qhr3 qhr4 .......
>>> date1 good bad good bad
>>> date2 bad good good good
>>> date3 bad bad bad good
>>> .
>>> .
>>> .
>>> and lets you use the values in "arr" to get values in "gw". Notice
>>> that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr
>>> simply will not accomplish anything correct/
>>>
>>> Merging by multiple criteria (with the merge function) would do that
>>> or you could construct a matrix whose entries were the categories good
>>> /bad. The table function could create the matrix for the purpose of
>>> using an indexed solution if you are dead-set against the merge
>>> concept.
>>>
>>>
>>>
>>>
>>> On Jan 17, 2010, at 4:47 PM, James Rome wrote:
>>>
>>>> Thank you Dennis.
>>>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>>>> weather$quarter)
>>>> seems to be what I want to do, but in fact, with the full data set, it
>>>> misidentifies the rows, so I think the error message must mean
>>>> something.
>>>>
>>>>> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y")
>>>>> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y")
>>>>> gw = c(length(arrr))
>>>>> gw[1:length(arrr[,1])]=FALSE
>>>>> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter]
>>>> Warning in `==.default`(arr$Date, weather$Date) :
>>>> longer object length is not a multiple of shorter object length
>>>> Warning in arr$Date == weather$Date & weather$quarter %in%
>>>> arr$quarter :
>>>> longer object length is not a multiple of shorter object length
>>>> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>> 0 0 0 0
>>>> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>> 0 0 0 0
>>>> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>> 0 0 0 0
>>>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>> 0 0
>>>> 0 0 0 0
>>>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>> 0 0
>>>> 0 0 0 0
>>>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>> 0 0
>>>> 0 0 0 0
>>>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>>> 0 0
>>>> 0 0 0 0
>>>> [260] 0 0 0 0 0 0 0 0
>>>>
>>>> There are many many more matches in the 99k line arrival data set.
>>>>
>>>> Thanks a bunch,
>>>> Jim
>>>>
>>>>
>>>> On 1/17/10 3:21 PM, Dennis Murphy wrote:
>>>>> Hi:
>>>>>
>>>>> To read a data set from a R-help message into R, one uses
>>>>> read.table(textConnection("<verbatim text>"), ...)
>>>>>
>>>>> Your weather data set had
>>>>> (a) a variable name with a space in it, that R misread and had to be
>>>>> altered manually;
>>>>> (b) a missing value with no NA that R interpreted as an incomplete
>>>>> line; again, it had
>>>>> to be altered manually.
>>>>>
>>>>> This is why David suggested the use of dput(), so that these vagaries
>>>>> don't have to be
>>>>> dealt with by those who are trying to help.
>>>>>
>>>>> That being said, for the example that you gave and the desired value
>>>>> that you wanted, try
>>>>>
>>>>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in%
>>>>> weather$quarter)
>>>>>
>>>>> (I changed DateTime to Date in the arr data frame...)
>>>>>
>>>>> You'll get warnings like
>>>>>
>>>>> Warning messages:
>>>>> 1: In is.na <http://is.na>(e1) | is.na <http://is.na>(e2) :
>>>>> longer object length is not a multiple of shorter object length
>>>>>
>>>>> but it seems to do the right thing. The first equality is there to
>>>>> constrain matches for
>>>>> quarter to be within the same day.
>>>>>
>>>>> For future reference,
>>>>>
>>>>>> dput(weather)
>>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label = "1/1/09",
>>>>> class = "factor"),
>>>>> minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L
>>>>> ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names =
>>>>> c("Date",
>>>>> "minute", "hour", "quarter", "efficiency"), class = "data.frame",
>>>>> row.names = c(NA,
>>>>> -4L))
>>>>>> dput(arr)
>>>>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09",
>>>>> class = "factor"),
>>>>> weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
>>>>> 5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L,
>>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
>>>>> quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
>>>>> 60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO =
>>>>> structure(c(6L,
>>>>> 8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, 6L, 6L,
>>>>> 2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", "CJC",
>>>>> "COA", "JBU", "NWA"), class = "factor"), Flight = structure(c(15L,
>>>>> 19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, 10L,
>>>>> 14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", "AWE307", "BTA1234",
>>>>> "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072",
>>>>> "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", "COA349",
>>>>> "COA855", "COA886", "JBU554", "NWA9934"), class = "factor"),
>>>>> gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
>>>>> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,
>>>>> FALSE)), .Names = c("Date", "weekday", "month", "quarter",
>>>>> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class =
>>>>> "data.frame")
>>>>>
>>>>> These can be copied and pasted directly into an R session without
>>>>> modification.
>>>>>
>>>>> HTH,
>>>>> Dennis
>>>>>
>>>>> On Sun, Jan 17, 2010 at 10:51 AM, James Rome <jamesrome at gmail.com
>>>>> <mailto:jamesrome at gmail.com>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 1/17/10 1:06 PM, David Winsemius wrote:
>>>>>>
>>>>>> On Jan 17, 2010, at 12:37 PM, James Rome wrote:
>>>>>>
>>>>>>> I don't think it is that simple because it is not a one-to-one
>>>>> match. In
>>>>>>> the arr data frame, there are many arrivals in a quarter hour
>>>>> with good
>>>>>>> weather on a given day. So I need to match the date and the quarter
>>>>>>> hour.
>>>>>>>
>>>>>>> And all of the rows in the weather data frame are times with good
>>>>>>> weather--unique date + quarter hour. That is why I needed the
>>>>> loop. For
>>>>>>> each date and quarter hour in weather, I want to mark all the
>>>>> entries
>>>>>>> with the corresponding date and weather as TRUE in the arr$gw
>>>>> column.
>>>>>>>
>>>>>>> I did convert the dates to POSIXlt dates and rewrote my function as
>>>>>>> gooddates = function(all, good) {
>>>>>>> la = length(all) # All the arrivals
>>>>>>> lw = length(good) # The good 15-minute periods
>>>>>>> for(j in 1:lw) {
>>>>>>> d=good$Date[j]
>>>>>>> q=good$quarter[j]
>>>>>>> all$gw[all$Date==d && all$quarter==q]=TRUE
>>>>>>
>>>>>>
>>>>>> You are attempting a vectorized test and assignment with "&&" which
>>>>>> seems unlikely to succeed, but even then I am not sure your problems
>>>>>> would be over. (I'm also guessing that you might not have reported a
>>>>>> warning.)
>>>>>
>>>>> Why shouldn't the && succeed? You are correct there, because I do
>>>>> get
>>>>> items if I use either part of this and test, when I insert the &&,
>>>>> I get
>>>>> no hits. And I got no warnings.
>>>>>>
>>>>>> Why not merge arr to gw by date and quarter?
>>>>> The sets contain different data, and the only thing I want from the
>>>>> weather set is the fact that it has an entry for a given date and
>>>>> time
>>>>>>
>>>>>> Answering these questions would be greatly speeded up with a small
>>>>>> sample dataset. Are you aware of the virtues of the dput function?
>>>>>>
>>>>>
>>>>> What I want is for a 1 to be in the gw column in the quarter
>>>>> 60,61,62,63,...
>>>>>
>>>>> For example, here is some data from the good weather set:
>>>>> Date minute hour quarter Efficiency Val
>>>>> 1/1/09 5 15 60
>>>>> 1/1/09 15 15 61 72
>>>>> 1/1/09 30 15 62 63.3
>>>>> 1/1/09 45 15 63 85.4
>>>>>
>>>>>
>>>>>
>>>>> And this is from the arrivals set:
>>>>> DateTime weekday month quarter ICAO
>>>>> Flight gw
>>>>>
>>>>> 1/1/09 5 1 59 COA COA349 0
>>>>> 1/1/09 5 1 59 NWA NWA9934 0
>>>>> 1/1/09 5 1 60 JBU JBU554 0
>>>>> 1/1/09 5 1 60 BTA BTA2347 0
>>>>> 1/1/09 5 1 60 COA COA886 0
>>>>> 1/1/09 5 1 60 BTA BTA2916 0
>>>>> 1/1/09 5 1 60 CJC CJC3225 0
>>>>> 1/1/09 5 1 60 BTA BTA2085 0
>>>>> 1/1/09 5 1 60 BTA BTA2064 0
>>>>> 1/1/09 5 1 60 AAL AAL842 0
>>>>> 1/1/09 5 1 60 BTA BTA1234 0
>>>>> 1/1/09 5 1 60 CJC CJC3359 0
>>>>> 1/1/09 5 1 60 BTA BTA3072 0
>>>>> 1/1/09 5 1 61 BTA BTA3086 0
>>>>> 1/1/09 5 1 61 COA COA1166 0
>>>>> 1/1/09 5 1 61 COA COA855 0
>>>>> 1/1/09 5 1 61 AWE AWE307 0
>>>>> 1/1/09 5 1 66 CHQ CHQ5312 0
>>>>> 1/1/09 5 1 67 BTA BTA2405 0
>>>>>
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>>
>> <arr.rda><weather.rda>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
More information about the R-help
mailing list