[R] merging data frames gives all NAs
David Winsemius
dwinsemius at comcast.net
Tue Feb 2 20:12:25 CET 2010
Yeah, sometimes the vocabulary we bring to a task does not match up
(or "merge" properly) with the vocabulary that the developers use. In
this case the merge operation is one that has a precise meaning in
database lingo, which apparently you do not have background in. My
experience in trying to "append" objects ran into similar frustrations
early in my R endeavors. For the life of me, I could not find any
instances of "append" in the index of the references I was using.
I am glad that you found that material helpful, but I think its use of
the terms "join" or "merge" are incorrect in a database framework as
well, so I do not think it could be used as an unambiguous guide. Your
use of "combine" was likewise ambiguous. In composing questions to R-
help, it is advised that you post a small example and illustrate what
you want to see as a result.
--
David.
On Feb 2, 2010, at 1:47 PM, James Rome wrote:
> On 2/1/2010 5:51 PM, David Winsemius wrote:
> I figured this out finally. I really believe that the R help write-
> ups are sorely lacking.
You should ponder whether you actually know enough to criticize the
help page when it describes the merge function as performing "database
join operations". My guess is that you don't. The help page are not to
be designed to teach basic computer programming concepts.
> As soon as I looked at http://www.statmethods.net/management/merging.html
> , it was obvious:
> Adding Columns
> To merge two dataframes (datasets) horizontally, use the merge
> function. In most cases, you join two dataframes by one or more
> common key variables (i.e., an inner join).
>
> # merge two dataframes by ID
> total <- merge(dataframeA,dataframeB,by="ID")
>
> # merge two dataframes by ID and Country
> total <- merge(dataframeA,dataframeB,by=c("ID","Country"))
>
> Adding Rows
> To join two dataframes (datasets) vertically, use the rbind
> function. The two dataframes must have the same variables, but they
> do not have to be in the same order.
>
> total <- rbind(dataframeA, dataframeB)
>
> I needed to add rows, and had to use rbind. If the help for merge
> said "To merge two dataframes (datasets) horizontally" I would have
> known right away that it was the wrong function to use.
>
> Thanks for the help,
> Jim Rome
>
>
> On Feb 1, 2010, at 5:30 PM, David Winsemius wrote:
>
>>
>> On Feb 1, 2010, at 5:16 PM, James Rome wrote:
>>
>>> Dear kind R helpers,
>>>
>>> I have a vector of runway names in rwy ("31R", "31L",... the
>>> number is user selectable)
>>> arrgnd is a data frame with data for all flights and all runways,
>>> with a Runway column.
>>> I am trying to subset arrgnd into a dat frame for each selected
>>> runway, and then combine them back together using the following
>>> code:
>>>
>>> for (j in 1:nr) { # nr = number of user-selected runways
>>
>> Safer would be:
>>
>> for (j in seq_along(rwy) {
>>
>>> ar4rw = arrgnd[arrgnd$Runway==rwy[j],]
>>
>> Clearer would be :
>>
>> ar4rw <- subset(arrgnd, Runway= j) # and I think the NA
>> line's will also disappear.
> ^ == ^
>>
>>
>>> if (j == 1) {
>>> arrw = ar4rw
>>> }
>>> else {
>>> arrw = merge(arrw, ar4rw)
>>> }
>>> }
>>
>> You really should give us something like:
>>
>> dput(rwy)
>> dput( head(arrgnd, 10) )
>>>
>>> but, the merge step gives me a data frame with all NAs. In
>>> addition, ar4rw always gets a row with NAs at the start, which I
>>> do not understand. There are no rows with all NAs in the arrgnd
>>> data frame.
>>> > ar4rw[1:2,] # first time through for 31R
>>> DateTime Date month hour minute quarter weekday IATA
>>> ICAO Flight
>>> NA <NA> <NA> NA NA NA NA NA <NA> <NA> <NA>
>>> 529 1/1/09 21:46 2009-01-01 1 21 46 87 5
>>> TA TAI TAI570
>>> AircraftType Tail Arrived STA Runway FromTo Delay
>>> NA <NA> <NA> <NA> <NA> <NA> <NA> NA
>>> 529 A320 N496TA 21:46:58 22:30 31R MSLP /KJFK 0
>>> Operator dq gw
>>> NA <NA> <NA> NA
>>> 529 TACA INTERNATIONAL AIRLINES 2009-01-01 87 1
>>>
>>> > ar4rw[1:2,] # second time through for 31L
>>> DateTime Date month hour minute quarter weekday IATA
>>> ICAO Flight
>>> NA <NA> <NA> NA NA NA NA NA <NA> <NA> <NA>
>>> 552 1/1/09 23:03 2009-01-01 1 23 3 92 5
>>> AA AAL AAL22
>>> AircraftType Tail Arrived STA Runway FromTo
>>> Delay Operator
>>> NA <NA> <NA> <NA> <NA> <NA> <NA> NA <NA>
>>> 552 B762 N329AA 23:03:35 23:10 31L LAX /JFK 0
>>> AMERICAN AIRLINES
>>> dq gw
>>> NA <NA> NA
>>>
>>> But after the merge, I get all NAs. What am I doing wrong?
>>
>> The data layout gets mangled and I cannot tell what rows are being
>> matched to what. Use dput to convey an unambiguous, and easily
>> replicated example.
>>>
>>> Thanks,
>>> Jim Rome
>>>
>>> 552 2009-01-01 92 1
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list