[R] Trying to avoid the loop while merging two data frames

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Tue Dec 22 21:34:23 CET 2015


I know I am overwriting.
merge doesn't solve it because each version in mydata is given to more
than one id. Hence, I thought I can't merge by version.
I am not sure how to answer the question about "the problem".
I described the current state and the desired state. If possible, I'd
like to get from the current state to the desired state faster than
when using a loop.

On Tue, Dec 22, 2015 at 2:26 PM, jim holtman <jholtman at gmail.com> wrote:
> You seem to be saving 'myid' and then overwriting it with the last
> statement:
>
>  result[[i]] <- result[[i]][c(5, 1:4)]
>
> Why doesn't 'merge' work for you?  I tried it on your data, and seem to get
> back the same number of rows; may not be in the same order, but the content
> looks the same, and it does have 'myid' on it.
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Tue, Dec 22, 2015 at 12:27 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>>
>> Hello!
>> I have a solution for my task that is based on a loop. However, it's
>> too slow for my real-life problem that is much larger in scope.
>> However, I cannot use merge. Any advice on how to do it faster?
>> Thanks a lot for any hint on how to speed it up!
>>
>> # I have 'mydata' data frame:
>> set.seed(123)
>> mydata <- data.frame(myid = 1001:1100,
>>                      version = sample(1:20, 100, replace = T))
>> head(mydata)
>> table(mydata$version)
>>
>> # I have 'myinfo' data frame that contains information for each 'version':
>> set.seed(12)
>> myinfo <- data.frame(version = sort(rep(1:20, 30)), a = rnorm(60), b =
>> rnorm(60),
>>                                  c = rnorm(60), d = rnorm(60))
>> head(myinfo, 40)
>>
>> ### MY SOLUTION WITH A LOOP:
>> ### Looping through each id of mydata and grabbing
>> ### all columns from 'myinfo' for the corresponding 'version':
>>
>> # 1. Creating placeholder list for the results:
>> result <- split(mydata[c("myid", "version")], f = list(mydata$myid))
>> length(result)
>> (result)[1:3]
>>
>>
>> # 2. Looping through each element of 'result':
>> for(i in 1:length(result)){
>>       id <- result[[i]]$myid
>>       result[[i]] <- myinfo[myinfo$version == result[[i]]$version, ]
>>       result[[i]]$myid <- id
>>       result[[i]] <- result[[i]][c(5, 1:4)]
>> }
>> result <- do.call(rbind, result)
>> head(result) # This is the desired result
>>
>> --
>> Dimitri Liakhovitski
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Dimitri Liakhovitski



More information about the R-help mailing list