[R-sig-hpc] A question about merge

陳慶全 zw12356 at gmail.com
Sat Oct 17 10:52:45 CEST 2015


Hello!

I am solving a problem about merging more than 2 data.frames. I know
that I can do this by

` R

Reduce(function(x,y) merge(x,y), listOfDataFrames)

`

But if my data.frames contains the variables shared the same name, it
would rename it with varibleName.x, varibleName.y.... Example:



   1. df1 = data.frame(x=1:5, y=rnorm(5))
   2. df2 = data.frame(x=1:5, y=rnorm(5))
   3. merge(df1, df2, by = "x")

it return a data.frame containing x, y.x, y.y.

But what I want is summing the variables shared the same name.

ex:


   1. library(plyr)
   2. library(dplyr)
   3.   wide_table = rbind.fill(list(df1,df2)) %>% tbl_dt(FALSE)
   4.   sum_without_na = function(vec) ifelse(all(is.na(vec)),
NA_integer_, sum(vec, na.rm = TRUE))
   5.   out = wide_table %>% group_by(y) %>%
summarise_each(funs(sum_without_na))


Although it can be done with above scripts, I can't accept the speed
when there are morecolumns in df1 and df2 or more data.frame.

I want to know whether R base or packages have a faster function to do
it and it can merge data.frames at once.

	[[alternative HTML version deleted]]



More information about the R-sig-hpc mailing list