[Rd] 10x slower merge in mac 2.9.1 vs. 2.9.0 (PR#13890)
Adrian Dragulescu
adrian_d at eskimo.com
Thu Aug 13 18:01:54 CEST 2009
This issue has been reported before
http://thread.gmane.org/gmane.comp.lang.r.devel/20945/focus=20959
It happens when data frames contain character strings.
Thanks,
Adrian
On Thu, 13 Aug 2009, Simon Urbanek wrote:
> Rick,
>
> I'm sorry, but I cannot reproduce it. You didn't supply sessionInfo() and the
> actual data, so all I can do is guess, but according to your description this
> test case shows no difference:
>
> set.seed(1)
> n=10000
> d1=data.frame(seqn=as.integer(runif(n)*n),a=rnorm(n),b=rnorm(n),c=rnorm(n),d=rnorm(n),e=rnorm(n),f=rnorm(n),g=rnorm(n),h=rnorm(n),i=rnorm(n))
> d2=data.frame(seqn=as.integer(runif(n)*n),a=rnorm(n),b=rnorm(n),c=rnorm(n),d=rnorm(n),e=rnorm(n),f=rnorm(n),g=rnorm(n),h=rnorm(n),i=rnorm(n))
> system.time(merge(d1,d2,by="seqn",all.x=TRUE))
>
> R 2.9.1:
>> system.time(merge(d1,d2,by="seqn",all.x=TRUE))
> user system elapsed
> 0.150 0.067 0.217
>
> R 2.9.0:
>> system.time(merge(d1,d2,by="seqn",all.x=TRUE))
> user system elapsed
> 0.148 0.068 0.216
>
> To substantiate your claim, please provide a reproducible example as well as
> sessionInfo() [and details on how you run it - GUI, CLI, ...], but I suspect
> the difference may be in your data, not R.
>
> Thanks,
> Simon
>
>
> On Aug 12, 2009, at 12:25 , richard_stahlhut at urmc.rochester.edu wrote:
>
>> Full_Name: Rick Stahlhut
>> Version: 2.9.1
>> OS: os x 10.5.7
>> Submission from: (NULL) (128.151.71.23)
>>
>>
>> I upgraded to 2.9.1 today from 2.9.0. I work with large CDC (center for
>> disease control) datasets and start, frequently, with a series of 23
>> large-ish
>> merges to create the final dataset I work on. I do this each time because
>> (a) R
>> is fast. why not? and b) the datasets occasionally get updated by CDC
>> and
>> it's easier to swap in new files that way.
>>
>> One such merge is two data.frames with 10 variables and 10,000 rows each.
>> The
>> command in question is:
>>
>> temp = merge (demo.2,ph,by="seqn",all.x=TRUE)
>>
>> in 2.9.0, this command took 3.3 seconds.
>> in 2.9.1, it took 35.8 seconds.
>>
>> I have reverted back to 2.9.0.
>>
>> Additional packages loaded are:
>>
>> library(Hmisc)
>> library(alr3)
>> library(epicalc)
>> library(ggplot2)
>> library(lattice)
>> library(reshape)
>> library(survey)
>> library(car)
>>
>> thanks very much for all the effort. R is wonderful.
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list