[R] Comparing dates in dataframes

James Rome jamesrome at gmail.com
Sat Jan 16 23:22:58 CET 2010


   I don't want to merge the data frames because there are many entries
in the arrival frame for each one in the weather frame. And it is the
missing dates and quarters in the weather frame that constitute the date
I want, namely those arrivals that occurred in bad (or good) weather.
   But I will try converting the dates as suggested tomorrow.
   Is there a way to do what I want without that for loop? There are
almost 100,000 rows in the arrivals frame, and R is grinding to a halt.
   And is there a way to get R to abort its current calculation? Ctrl-C
and Esc do not seem to work.

Thanks,
Jim

On 1/16/10 4:26 PM, Stephan Kolassa wrote:
> Hi,
>
> it looks like when you read in your data.frames, you didn't tell R to
> expect dates, so it treats the Date columns as factors. Judicious use
> of  something along these lines before doing your comparisons may help:
>
> arr$Date <- as.Date(as.character(arr$Date),format=something)
>
> Then again, it may be possible to do the actual merging using merge().
>
> HTH
> Stephan
>
>
> James Rome schrieb:
>> I have two data frames. One (arr) has all arrivals to an airport for a
>> year, and the other (gw) has the dates and quarter hour of the day when
>> the weather is good. arr has a Date and quarter hour column.
>>
>>> names(arr)
>>  [1] "Date"     "weekday"      "hour"         "month"       
>> "minute"      [6] "quarter"      "ICAO"         "Flight"      
>> "AircraftType"
>> "Tail"       [11] "Arrived"      "STA"          "Runway"      
>> "FromTo"      "Delay"      [16] "Operator"     "gw"
>> I added the gw column to arr and initialized it to all FALSE
>>
>>> names(gw)
>>  [1] "Date"           "minute"         "hour"          
>> "quarter"       [5] "Efficiency.Val" "Weekly.Avg"    
>> "Arrival.Val"    "Weekly.Avg.1"  [9] "Departure.Val" 
>> "Weekly.Avg.2"   "Num.of.Hold"    "Runway"       [13] "Weather"
>> First point of confusion:
>>> gw[1,1]
>> [1] 1/1/09
>> 353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10 1/12/09 ...
>> 9/9/09
>> Why do I get 353 levels?
>>
>> I am trying to identify the quarter hours with good weather in the arr
>> data frame. What I want to do is to go through the rows in gw, and to
>> set arr$gw to TRUE if arr$Date and arr$quarter match those in the gw
>> row.
>>
>> So I tried
>> gooddates = function(all, good) {
>>    la = length(all)   # All the flights
>>   lw = length(good)  # The good 15-minute periods
>>   for(j in 1:lw) {
>>     d=good$Date[j]
>>     q=good$quarter[j]
>>     all[all$DateTime==d && all$quarter==q,17]=TRUE
>>   }
>> }
>>
>> but when I run this, I get
>> "Error in Ops.factor(all$DateTime, d) :
>>   level sets of factors are different"
>>
>> I know the level sets are different, that is what I am trying to find.
>> But I think I am comparing single elements from the data frames.
>>
>> So what am I doing wrong? And there ought to be a better way to do this.
>>
>> Thanks in advance,
>> Jim Rome
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list