[R] merging data frames gives all NAs

David Winsemius dwinsemius at comcast.net
Tue Feb 2 20:12:25 CET 2010


Yeah, sometimes the vocabulary we bring to a task does not match up  
(or "merge" properly) with the vocabulary that the developers use. In  
this case the merge operation is one that has a precise meaning in  
database lingo, which apparently you do not have background in.  My  
experience in trying to "append" objects ran into similar frustrations  
early in my R endeavors. For the life of me, I could not find any  
instances of "append" in the index of the references I was using.

I am glad that you found that material helpful, but I think its use of  
the terms "join" or "merge"  are incorrect in a database framework as  
well, so I do not think it could be used as an unambiguous guide. Your  
use of "combine" was likewise ambiguous. In composing questions to R- 
help, it is advised that you post a small example and illustrate what  
you want to see as a result.

-- 
David.



On Feb 2, 2010, at 1:47 PM, James Rome wrote:

> On 2/1/2010 5:51 PM, David Winsemius wrote:
> I figured this out finally. I really believe that the R help write- 
> ups are sorely lacking.

You should ponder whether you actually know enough to criticize the  
help page when it describes the merge function as performing "database  
join operations". My guess is that you don't. The help page are not to  
be designed to teach basic computer programming concepts.



> As soon as I looked at http://www.statmethods.net/management/merging.html 
> , it was obvious:
> Adding Columns
> To merge two dataframes (datasets) horizontally, use the merge  
> function. In most cases, you join two dataframes by one or more  
> common key variables (i.e., an inner join).
>
> # merge two dataframes by ID
> total <- merge(dataframeA,dataframeB,by="ID")
>
> # merge two dataframes by ID and Country
> total <- merge(dataframeA,dataframeB,by=c("ID","Country"))
>
> Adding Rows
> To join two dataframes (datasets) vertically, use the rbind  
> function. The two dataframes must have the same variables, but they  
> do not have to be in the same order.
>
> total <- rbind(dataframeA, dataframeB)
>
> I needed to add rows, and had to use rbind. If the help for merge  
> said "To merge two dataframes (datasets) horizontally" I would have  
> known right away that it was the wrong function to use.
>
> Thanks for the help,
> Jim Rome
>
>
> On Feb 1, 2010, at 5:30 PM, David Winsemius wrote:
>
>>
>> On Feb 1, 2010, at 5:16 PM, James Rome wrote:
>>
>>> Dear kind R helpers,
>>>
>>> I have a vector of runway names in rwy  ("31R", "31L",...  the  
>>> number is user selectable)
>>> arrgnd is a data frame with data for all flights and all runways,  
>>> with a Runway column.
>>> I am trying to subset arrgnd into a dat frame for each selected  
>>> runway, and then combine them back together using the following  
>>> code:
>>>
>>> for (j in 1:nr) {    # nr = number of user-selected runways
>>
>> Safer would be:
>>
>> for (j in seq_along(rwy) {
>>
>>>   ar4rw = arrgnd[arrgnd$Runway==rwy[j],]
>>
>> Clearer would be :
>>
>>        ar4rw <- subset(arrgnd, Runway= j) # and I think the NA  
>> line's will also disappear.
>                                      ^ ==  ^
>>
>>
>>>   if (j == 1) {
>>>       arrw = ar4rw
>>>   }
>>>   else {
>>>       arrw = merge(arrw, ar4rw)
>>>   }
>>> }
>>
>> You really should give us something like:
>>
>> dput(rwy)
>> dput( head(arrgnd, 10) )
>>>
>>> but, the merge step gives me a data frame with all NAs. In  
>>> addition, ar4rw always gets a row with NAs at the start, which I  
>>> do not understand. There are no rows with all NAs in the arrgnd  
>>> data frame.
>>> > ar4rw[1:2,]  # first time through for 31R
>>>       DateTime       Date month hour minute quarter weekday IATA  
>>> ICAO Flight
>>> NA <NA> <NA>    NA   NA     NA      NA      NA <NA> <NA> <NA>
>>> 529 1/1/09 21:46 2009-01-01     1   21     46      87       5    
>>> TA  TAI TAI570
>>>   AircraftType   Tail  Arrived   STA Runway     FromTo Delay
>>> NA <NA> <NA> <NA> <NA> <NA> <NA>    NA
>>> 529         A320 N496TA 21:46:58 22:30    31R MSLP /KJFK     0
>>>                      Operator            dq gw
>>> NA <NA> <NA> NA
>>> 529 TACA INTERNATIONAL AIRLINES 2009-01-01 87  1
>>>
>>> > ar4rw[1:2,]   # second time through for 31L
>>>       DateTime       Date month hour minute quarter weekday IATA  
>>> ICAO Flight
>>> NA <NA> <NA>    NA   NA     NA      NA      NA <NA> <NA> <NA>
>>> 552 1/1/09 23:03 2009-01-01     1   23      3      92       5    
>>> AA  AAL  AAL22
>>>   AircraftType   Tail  Arrived   STA Runway    FromTo  
>>> Delay          Operator
>>> NA <NA> <NA> <NA> <NA> <NA> <NA>    NA <NA>
>>> 552         B762 N329AA 23:03:35 23:10    31L LAX  /JFK     0  
>>> AMERICAN AIRLINES
>>>              dq gw
>>> NA <NA> NA
>>>
>>> But after the merge, I get all NAs. What am I doing wrong?
>>
>> The data layout gets mangled and I cannot tell what rows are being  
>> matched to what. Use dput to convey an unambiguous, and easily  
>> replicated example.
>>>
>>> Thanks,
>>> Jim Rome
>>>
>>> 552 2009-01-01 92  1
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list