[Rd] 10x slower merge in mac 2.9.1 vs. 2.9.0 (PR#13890)

adrian_d at eskimo.com adrian_d at eskimo.com
Thu Aug 13 18:05:43 CEST 2009


This issue has been reported before
http://thread.gmane.org/gmane.comp.lang.r.devel/20945/focus=20959

It happens when data frames contain character strings.

Thanks,
Adrian



On Thu, 13 Aug 2009, Simon Urbanek wrote:

> Rick,
>
> I'm sorry, but I cannot reproduce it. You didn't supply sessionInfo() and the 
> actual data, so all I can do is guess, but according to your description this 
> test case shows no difference:
>
> set.seed(1)
> n=10000
> d1=data.frame(seqn=as.integer(runif(n)*n),a=rnorm(n),b=rnorm(n),c=rnorm(n),d=rnorm(n),e=rnorm(n),f=rnorm(n),g=rnorm(n),h=rnorm(n),i=rnorm(n))
> d2=data.frame(seqn=as.integer(runif(n)*n),a=rnorm(n),b=rnorm(n),c=rnorm(n),d=rnorm(n),e=rnorm(n),f=rnorm(n),g=rnorm(n),h=rnorm(n),i=rnorm(n))
> system.time(merge(d1,d2,by="seqn",all.x=TRUE))
>
> R 2.9.1:
>> system.time(merge(d1,d2,by="seqn",all.x=TRUE))
>  user  system elapsed
> 0.150   0.067   0.217
>
> R 2.9.0:
>> system.time(merge(d1,d2,by="seqn",all.x=TRUE))
>  user  system elapsed
> 0.148   0.068   0.216
>
> To substantiate your claim, please provide a reproducible example as well as 
> sessionInfo() [and details on how you run it - GUI, CLI, ...], but I suspect 
> the difference may be in your data, not R.
>
> Thanks,
> Simon
>
>
> On Aug 12, 2009, at 12:25 , richard_stahlhut at urmc.rochester.edu wrote:
>
>> Full_Name: Rick Stahlhut
>> Version: 2.9.1
>> OS: os x 10.5.7
>> Submission from: (NULL) (128.151.71.23)
>> 
>> 
>> I upgraded to 2.9.1 today from 2.9.0.   I work with large CDC (center for
>> disease control) datasets and start, frequently, with a series of 23 
>> large-ish
>> merges to create the final dataset I work on.  I do this each time because 
>> (a) R
>> is fast.  why not?   and b) the datasets occasionally get updated by CDC 
>> and
>> it's easier to swap in new files that way.
>> 
>> One such merge is two data.frames with 10 variables and 10,000 rows each. 
>> The
>> command in question is:
>> 
>> temp = merge (demo.2,ph,by="seqn",all.x=TRUE)
>> 
>> in 2.9.0, this command took 3.3 seconds.
>> in 2.9.1, it took 35.8 seconds.
>> 
>> I have reverted back to 2.9.0.
>> 
>> Additional packages loaded are:
>> 
>> library(Hmisc)
>> library(alr3)
>> library(epicalc)
>> library(ggplot2)
>> library(lattice)
>> library(reshape)
>> library(survey)
>> library(car)
>> 
>> thanks very much for all the effort.  R is wonderful.
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list