[R] group by and merge two dataframes

Massimo Bressan mbressan at arpa.veneto.it
Fri May 9 10:00:16 CEST 2014


yes thanks, that's correct!

here a slight variation inspired by your solution: a cartesian product 
restricted to non duplicated records to get the logical vector i to be 
used in the next natural join

i<-!duplicated(merge(df1$id,df1$item, by=NULL))
merge(df1[i,],df2)

thanks

Il 08/05/2014 18:43, arun ha scritto:
> Hi,
> May be:
> indx <- !duplicated(as.character(interaction(df1[,-3])))
> merge(df1[indx,],df2)
> A.K.
>
>
>
>
> On Thursday, May 8, 2014 12:34 PM, Massimo Bressan<mbressan at arpa.veneto.it>  wrote:
> yes, thank you for all your replies, they worked out correctly indeed...
>
> ...but because of my fault, by then working on my real data I fully
> realised that I should have mentioned something that is changing (quite
> a lot, in fact) the terms of the problem...
>
> please would you consider the following (consistent) variation ?
>
> df1 <- data.frame(id=rep(1:3,each=2), item=c(rep("A",2), rep("B",2),
> rep("C",2)), v=rnorm(6))
> df2 <- data.frame(id=c(1,2,3), who=c("tizio","caio","sempronio"))
>
> and again I need to group the first dataframe "df1" both by "id" and by
> the first record of "v", and then merge with the second dataframe "df2"
> (again by "id")
>
> now, how to do that?
> (that's why probably I was pointing in my first post to the use of sqldf)
>
> thanks
>
> ps: I'm in doubt wheter I must open another thread or keep going with
> this one (really sorry for the eventual violation of the R-help netiquette)
>
>
> Il 08/05/2014 17:14, arun ha scritto:
>> Hi,
>> May be this helps:
>>     merge(unique(df1),df2)
>> A.K.
>>
>>
>>
>>
>>
>> On Thursday, May 8, 2014 5:46 AM, Massimo Bressan<mbressan at arpa.veneto.it>  wrote:
>> given this "bare bone" example:
>>
>> df1 <- data.frame(id=rep(1:3,each=2), item=c(rep("A",2), rep("B",2),
>> rep("C",2)))
>> df2 <- data.frame(id=c(1,2,3), who=c("tizio","caio","sempronio"))
>>
>> I need to group the first dataframe "df1" by "id" and then merge with
>> the second dataframe "df2" (again by "id")
>> so far I've manged to accomplish the task by something like the following...
>>
>> # start
>>
>> require(sqldf)
>> tmp<-sqldf("select * from df1 group by id")
>> merge(tmp, df2)
>>
>> #end
>>
>> now I'm wonderng if there is a more efficient and/or elegant way to
>> perform it (also because in fact I'm dealing with much more "heavy"
>> dataframes);
>>
>> may be possible through a single sql statement?  or by using a different
>> package functions (e.g. dplyr)?
>> my attempts towards these alternative approaches miserably failed ...
>>
>> thanks
>>
>> ______________________________________________
>> R-help at r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>



More information about the R-help mailing list