[R] Comparing dates in dataframes
David Winsemius
dwinsemius at comcast.net
Sun Jan 17 19:06:39 CET 2010
On Jan 17, 2010, at 12:37 PM, James Rome wrote:
> I don't think it is that simple because it is not a one-to-one
> match. In
> the arr data frame, there are many arrivals in a quarter hour with
> good
> weather on a given day. So I need to match the date and the quarter
> hour.
>
> And all of the rows in the weather data frame are times with good
> weather--unique date + quarter hour. That is why I needed the loop.
> For
> each date and quarter hour in weather, I want to mark all the entries
> with the corresponding date and weather as TRUE in the arr$gw column.
>
> I did convert the dates to POSIXlt dates and rewrote my function as
> gooddates = function(all, good) {
> la = length(all) # All the arrivals
> lw = length(good) # The good 15-minute periods
> for(j in 1:lw) {
> d=good$Date[j]
> q=good$quarter[j]
> all$gw[all$Date==d && all$quarter==q]=TRUE
You are attempting a vectorized test and assignment with "&&" which
seems unlikely to succeed, but even then I am not sure your problems
would be over. (I'm also guessing that you might not have reported a
warning.)
Why not merge arr to gw by date and quarter?
Answering these questions would be greatly speeded up with a small
sample dataset. Are you aware of the virtues of the dput function?
--
David
> }
> }
>
> Now it runs with no errors, but none of the 0s (FALSE) in arr$gw get
> replaced with 1s. So I am still doing something wrong.
>
> Thanks,
> Jim
>
> On 1/16/10 6:11 PM, jim holtman wrote:
>> If you have a vector of the quarter hours of good weather (gw), then
>> to create the column in the arr dataframe you would do
>>
>> arr$GoodWeather <- arr$quarter %in% gw
>>
>> This says that if the quarter hour of the arrival is in the 'gw'
>> vector, set the value TRUE; otherwise FALSE.
>>
>>
>> On 1/16/10 4:26 PM, Stephan Kolassa wrote:
>>> Hi,
>>>
>>> it looks like when you read in your data.frames, you didn't tell
>> R to
>>> expect dates, so it treats the Date columns as factors.
>> Judicious use
>>> of something along these lines before doing your comparisons
>> may help:
>>>
>>> arr$Date <- as.Date(as.character(arr$Date),format=something)
>>>
>>> Then again, it may be possible to do the actual merging using
>> merge().
>>>
>>> HTH
>>> Stephan
>>>
>>>
>>> James Rome schrieb:
>>>> I have two data frames. One (arr) has all arrivals to an
>> airport for a
>>>> year, and the other (gw) has the dates and quarter hour of the
>> day when
>>>> the weather is good. arr has a Date and quarter hour column.
>>>>
>>>>> names(arr)
>>>> [1] "Date" "weekday" "hour" "month"
>>>> "minute" [6] "quarter" "ICAO" "Flight"
>>>> "AircraftType"
>>>> "Tail" [11] "Arrived" "STA" "Runway"
>>>> "FromTo" "Delay" [16] "Operator" "gw"
>>>> I added the gw column to arr and initialized it to all FALSE
>>>>
>>>>> names(gw)
>>>> [1] "Date" "minute" "hour"
>>>> "quarter" [5] "Efficiency.Val" "Weekly.Avg"
>>>> "Arrival.Val" "Weekly.Avg.1" [9] "Departure.Val"
>>>> "Weekly.Avg.2" "Num.of.Hold" "Runway" [13] "Weather"
>>>> First point of confusion:
>>>>> gw[1,1]
>>>> [1] 1/1/09
>>>> 353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10
>> 1/12/09 ...
>>>> 9/9/09
>>>> Why do I get 353 levels?
>>>>
>>>> I am trying to identify the quarter hours with good weather in
>> the arr
>>>> data frame. What I want to do is to go through the rows in gw,
>> and to
>>>> set arr$gw to TRUE if arr$Date and arr$quarter match those in
>> the gw
>>>> row.
>>>>
>>>> So I tried
>>>> gooddates = function(all, good) {
>>>> la = length(all) # All the flights
>>>> lw = length(good) # The good 15-minute periods
>>>> for(j in 1:lw) {
>>>> d=good$Date[j]
>>>> q=good$quarter[j]
>>>> all[all$DateTime==d && all$quarter==q,17]=TRUE
>>>> }
>>>> }
>>>>
>>>> but when I run this, I get
>>>> "Error in Ops.factor(all$DateTime, d) :
>>>> level sets of factors are different"
>>>>
>>>> I know the level sets are different, that is what I am trying
>> to find.
>>>> But I think I am comparing single elements from the data frames.
>>>>
>>>> So what am I doing wrong? And there ought to be a better way to
>> do this.
>>>>
>>>> Thanks in advance,
>>>> Jim Rome
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>> <http://www.r-project.org/posting-guide.html>
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list