[R-sig-hpc] A question about merge
陳慶全
zw12356 at gmail.com
Sat Oct 17 10:52:45 CEST 2015
Hello!
I am solving a problem about merging more than 2 data.frames. I know
that I can do this by
` R
Reduce(function(x,y) merge(x,y), listOfDataFrames)
`
But if my data.frames contains the variables shared the same name, it
would rename it with varibleName.x, varibleName.y.... Example:
1. df1 = data.frame(x=1:5, y=rnorm(5))
2. df2 = data.frame(x=1:5, y=rnorm(5))
3. merge(df1, df2, by = "x")
it return a data.frame containing x, y.x, y.y.
But what I want is summing the variables shared the same name.
ex:
1. library(plyr)
2. library(dplyr)
3. wide_table = rbind.fill(list(df1,df2)) %>% tbl_dt(FALSE)
4. sum_without_na = function(vec) ifelse(all(is.na(vec)),
NA_integer_, sum(vec, na.rm = TRUE))
5. out = wide_table %>% group_by(y) %>%
summarise_each(funs(sum_without_na))
Although it can be done with above scripts, I can't accept the speed
when there are morecolumns in df1 and df2 or more data.frame.
I want to know whether R base or packages have a faster function to do
it and it can merge data.frames at once.
[[alternative HTML version deleted]]
More information about the R-sig-hpc
mailing list