[R] Trying to avoid the loop while merging two data frames

jim holtman jholtman at gmail.com
Tue Dec 22 20:26:17 CET 2015


You seem to be saving 'myid' and then overwriting it with the last
statement:

 result[[i]] <- result[[i]][c(5, 1:4)]

Why doesn't 'merge' work for you?  I tried it on your data, and seem to get
back the same number of rows; may not be in the same order, but the content
looks the same, and it does have 'myid' on it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Tue, Dec 22, 2015 at 12:27 PM, Dimitri Liakhovitski <
dimitri.liakhovitski at gmail.com> wrote:

> Hello!
> I have a solution for my task that is based on a loop. However, it's
> too slow for my real-life problem that is much larger in scope.
> However, I cannot use merge. Any advice on how to do it faster?
> Thanks a lot for any hint on how to speed it up!
>
> # I have 'mydata' data frame:
> set.seed(123)
> mydata <- data.frame(myid = 1001:1100,
>                      version = sample(1:20, 100, replace = T))
> head(mydata)
> table(mydata$version)
>
> # I have 'myinfo' data frame that contains information for each 'version':
> set.seed(12)
> myinfo <- data.frame(version = sort(rep(1:20, 30)), a = rnorm(60), b =
> rnorm(60),
>                                  c = rnorm(60), d = rnorm(60))
> head(myinfo, 40)
>
> ### MY SOLUTION WITH A LOOP:
> ### Looping through each id of mydata and grabbing
> ### all columns from 'myinfo' for the corresponding 'version':
>
> # 1. Creating placeholder list for the results:
> result <- split(mydata[c("myid", "version")], f = list(mydata$myid))
> length(result)
> (result)[1:3]
>
>
> # 2. Looping through each element of 'result':
> for(i in 1:length(result)){
>       id <- result[[i]]$myid
>       result[[i]] <- myinfo[myinfo$version == result[[i]]$version, ]
>       result[[i]]$myid <- id
>       result[[i]] <- result[[i]][c(5, 1:4)]
> }
> result <- do.call(rbind, result)
> head(result) # This is the desired result
>
> --
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list