[R] merging single column from different dataframe
David Winsemius
dwinsemius at comcast.net
Sun Jun 3 22:25:29 CEST 2012
On Jun 3, 2012, at 3:22 PM, Kai Mx wrote:
> Hi all,
> probably really simple to solve, but having no background in
> programming I
> haven't been able to figure this out: I have two dataframes like
>
> df1 <- data.frame(names1=c('aa','ab', 'ac', 'ad'), var1=c(1,5,7,12))
> df2 <- data.frame(names2=c('aa', 'ab', 'ac', 'ad', 'ae'),
> var2=c(3,6,9,12,15))
>
> Now I want merge var1 to df2 by matching the dataframes by the 'names'
> columns, i.e. something like
>
> df3 <- merge (df2, df1, by.x='names2', by.y='names1', all.x=T)
>
> However, the original dataframes have quite a lot of columns and I
> thought
> that I should be able to address the var1 column by something like
> df1$var[[df2$name2]].
Well there is no df1$var object or even a column with that reference.
Even if you meant to type `var1`, the object df1$var[[df2$name2]]
would not make much sense, since that would still be a failed attempt
to access an named vector and df1$var1 is not named.
> names( df1$var1)
NULL
The "[[" operation is different than the "[" operation. "[[" returns
only one item. "[" returns multiple items. In the case of dataframes
(of which df1$var1 is _not_ an example) , "[[" returns one entire
column as a vector. If you had been trying to access a named vector
using the 'names' in the character vector df1$names1 and there were
any matches then you might have had some success with '['.
Even then there are gators in the swamp.
vec1 <- c(aa=3, gx =10, ac=4, cc = 12)
vec1[df1$names1]
aa gx ac cc
3 10 4 12
WTF?
Well, by default R's dataframes construct factor variables for
character arguments and have an underlying numeric representation, so
by the time df1$names got coerced it ended up as 1,2,3,4 and
referenced all of vec1. These other methods would return something
appropriate:
vec1[as.character(df1$names1)]
aa <NA> ac <NA>
3 NA 4 NA
vec1[which(names(vec1) %in% df1$names1)]
aa ac
3 4
I happen to think that returning NA is unfortunate in the first
inatance, but I did not construct the language and there must have
been some good reason to make it that way.
> Could somebody please enlighten me and/or maybe
> suggest a short tutorial for the extraction operator?
Arguments to "[" can be numeric, character, or logical. If numeric, it
will return values at the sequence locations along the referenced
object. If character, it will return the matched items with those
names. if logical, the call will return those items for which the
index is TRUE (and there will be argument recycling, so this will
return every second item in df1$var1
> df1$var1[c(FALSE, TRUE)]
[1] 5 12
Spend some time working through the examples on ?Extract and then re-
reading that help page at least three times, although I probably took
me ten or twenty times to get a pretty good grasp of it. The material
there is accurate and precise, but the subtleties are numerous.
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list