[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames

Rathore, Saubhagya Singh saubhagya at gatech.edu
Fri Jun 23 21:48:32 CEST 2017


Thank you very much Mr. Barradas for your suggestions. I feel embarrassed for not pre-allocating D3. Just pre-allocating D3 significantly reduced my run-time for code. My actual problem has both D1 and D2 consisting of 600 observations each.  Your second suggestion of using do.call proved to be better than using nested for loops. The run-times for my problem from both the methods are presented below.

1) Nested Loop with pre-allocation: 
#    user  system elapsed 
# 1199.89   14.08 1215.75

2) expand.grid- do.cal
# user  system elapsed 
# 131.56    0.00  131.61

Thank you again for your generous help.

-----Original Message-----
From: Rui Barradas [mailto:ruipbarradas at sapo.pt] 
Sent: Friday, June 23, 2017 12:03 PM
To: Rathore, Saubhagya Singh <saubhagya at gatech.edu>; r-help-owner at r-project.org; r-help at r-project.org
Subject: Re: [R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames

Hello,

Another way would be

n <- nrow(expand.grid(1:nrow(D1), 1:nrow(D2)))
D5 <- data.frame(distance=integer(n),difference=integer(n))

D5[] <- do.call(rbind, lapply(seq_len(nrow(D1)), function(i) t(sapply(seq_len(nrow(D2)), function(j){
	 
c(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2)
	}
))))

identical(D3, D5)


In my first answer I forgot to say that constructs like 1:nrow(...) or more generally 1:m are error prone. If m == 0 you will have the perfectly legal loop for(i in 1:0) but an illegal zero index.
The solution is to use ?seq_len or ?seq_along (same help page). Like
this: for(i in seq_len(m)). In your case m is either nrow(D1) or nrow(D2).

Hope this helps,

Rui Barradas



Em 23-06-2017 16:35, Rui Barradas escreveu:
> Hello,
>
> The obvious way would be to preallocate the resulting data.frame, to 
> expand an empty one on each iteration being a time expensive operation.
>
> n <- nrow(expand.grid(1:nrow(D1), 1:nrow(D2)))
> D4 <- data.frame(distance=integer(n),difference=integer(n))
> k <- 0
> for (i in 1:nrow(D1)){
>      for (j in 1:nrow(D2))  {
>          k <- k + 1
>          D4[k, ] <-
> c(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j
> ,3])^2)
>
>      }
> }
>
> identical(D3, D4)
>
> Hope this helps,
>
> Rui Barradas
>
> Em 23-06-2017 16:19, Rathore, Saubhagya Singh escreveu:
>> For certain reason, the content was not visible in the last mail, so 
>> posting it again.
>>
>> Dear Members,
>>
>> I have two different dataframes with a different number of rows. I 
>> need to apply a set of functions to each possible combination of rows 
>> with one row coming from 1st dataframe and other from 2nd dataframe.
>> Though I am able to perform this task using for loops, I feel that 
>> there must be a more efficient way to do it. An example case is given 
>> below. D1 and D2 are two dataframes. I need to evaluate D3 with one 
>> column as the Euclidean distance in the x-y plane and second column 
>> as squared difference of z values, of each row pair from D1 and D2.
>>
>> D1<-data.frame(x=1:5,y=6:10,z=rnorm(5))
>> D2<-data.frame(x=19:30,y=41:52,z=rnorm(12))
>> D3<-data.frame(distance=integer(0),difference=integer(0))
>>
>> for (i in 1:nrow(D1)){
>>
>> for (j in 1:nrow(D2))  {
>>
>> temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),differen
>> ce=(D1[i,3]-D2[j,3])^2)
>>
>> D3<-rbind(D3,temp)
>> }
>> }
>>
>> Thank you
>>
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of 
>> r-help-owner at r-project.org
>> Sent: Friday, June 23, 2017 10:47 AM
>> To: Rathore, Saubhagya Singh <saubhagya at gatech.edu>
>> Subject: R version 3.3.2, Windows 10: Applying a function to each 
>> possible pair of rows from two different data-frames
>>
>> The message's content type was not explicitly allowed
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list