[R] merging single column from different dataframe

David Winsemius dwinsemius at comcast.net
Sun Jun 3 22:25:29 CEST 2012


On Jun 3, 2012, at 3:22 PM, Kai Mx wrote:

> Hi all,
> probably really simple to solve, but having no background in  
> programming I
> haven't been able to figure this out: I have two dataframes like
>
> df1 <- data.frame(names1=c('aa','ab', 'ac', 'ad'), var1=c(1,5,7,12))
> df2 <- data.frame(names2=c('aa', 'ab', 'ac', 'ad', 'ae'),
> var2=c(3,6,9,12,15))
>
> Now I want merge var1 to df2 by matching the dataframes by the 'names'
> columns, i.e. something like
>
> df3 <- merge (df2, df1, by.x='names2', by.y='names1', all.x=T)
>
> However, the original dataframes have quite a lot of columns and I  
> thought
> that I should be able to address the var1 column by something like
> df1$var[[df2$name2]].

Well there is no df1$var object or even a column with that reference.  
Even if you meant to type `var1`,  the object df1$var[[df2$name2]]  
would not make much sense, since that would still be a failed attempt  
to access an named vector and df1$var1 is not named.

 > names( df1$var1)
NULL

The "[[" operation is different than the "[" operation. "[[" returns  
only one item. "[" returns multiple items. In the case of dataframes  
(of which df1$var1 is _not_ an example) , "[[" returns one entire  
column as a vector. If you had been trying to access a named vector  
using the 'names' in the character vector df1$names1 and there were  
any matches then you might have had some success with '['.

Even then there are gators in the swamp.

vec1 <- c(aa=3, gx =10,  ac=4, cc = 12)
vec1[df1$names1]
aa gx ac cc
  3 10  4 12

WTF?

Well, by default R's dataframes construct factor variables for  
character arguments and have an underlying numeric representation, so  
by the time df1$names got coerced it ended up as 1,2,3,4 and  
referenced all of vec1. These other methods would return something  
appropriate:

vec1[as.character(df1$names1)]
   aa <NA>   ac <NA>
    3   NA    4   NA


vec1[which(names(vec1) %in% df1$names1)]
aa ac
  3  4

I happen to think that returning NA is unfortunate in the first  
inatance, but I did not construct the language and there must have  
been some good reason to make it that way.

> Could somebody please enlighten me and/or maybe
> suggest a short tutorial for the extraction operator?

Arguments to "[" can be numeric, character, or logical. If numeric, it  
will return values at the sequence locations along the referenced  
object.  If character, it will return the matched items with those  
names. if logical, the call will return those items for which the  
index is TRUE (and there will be argument recycling, so this will  
return every second item in df1$var1

 > df1$var1[c(FALSE, TRUE)]
[1]  5 12


Spend some time working through the examples on ?Extract and then re- 
reading that help page at least three times, although I probably took  
me ten or twenty times to get a pretty good grasp of it.  The material  
there is accurate and precise, but the subtleties are numerous.

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list