[R-SIG-Win] R4.2.3 slower than R4.1.3 on Windows only

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Wed Jun 7 10:32:40 CEST 2023


On 6/6/23 11:42, Fredrik Skoog wrote:
> Hi,
>
> I tried a real world example (quant finance: reading up some data, 
> transform/calc, write back to files), below is the runtime of the 
> script run with some different versions:
>
> Latest R dev build: 15 minutes
> R 4.1.3: 10 minutes
> R 4.2.3: 23 minutes
>
> So the latest R dev build is a lot better than R4.2.3, but still 
> slower than R 4.1.3

Great, that's quite good news, thanks for running the experiments.

Best,
Tomas

>
> Best regards,
>
> Fredrik
>
>
>
>
> Den mån 22 maj 2023 kl 17:29 skrev Tomas Kalibera 
> <tomas.kalibera using gmail.com>:
>
>
>     On 5/17/23 16:07, Tomas Kalibera wrote:
>     >
>     > On 4/18/23 14:16, Fredrik Skoog wrote:
>     >> Hi,
>     >>
>     >> If you run:
>     >>
>     >> library(microbenchmark)
>     >> m <- matrix(rnorm(28000000), nrow=7000, byrow=TRUE)
>     >> rownames(m) <- rownames(m, do.NULL = FALSE, prefix = "this is a
>     row
>     >> name")
>     >> colnames(m) <- colnames(m, do.NULL = FALSE, prefix = "this is a
>     column
>     >> name")
>     >> microbenchmark(df <- as.data.frame(m, keep.rownames=TRUE),
>     times=10)
>     >>
>     >> The results shows worse performance in R4.2.3 (also bigger
>     variations)
>     >> compared to v4.1.3. Also v4.2.0 shows worse performance, so it
>     looks
>     >> like
>     >> it's 4.2.0 and later that has this issue. On Linux it's all
>     good, so it
>     >> seems to be a Windows only issue.
>     >>
>     >> Version 4.2.3
>     >> ==============
>     >>
>     >> Run 1
>     >> ------
>     >> Unit: seconds
>     >>                                           expr min lq     mean
>     >>   median       uq      max neval
>     >>   df <- as.data.frame(m, keep.rownames = TRUE) 1.324839 2.411304
>     >> 2.760553
>     >> 2.593452 3.290228 4.263175    10
>     >>
>     >> Run 2
>     >> ------
>     >> Unit: milliseconds
>     >>                                           expr min lq     mean
>     >>   median       uq     max neval
>     >>   dt <- as.data.frame(m, keep.rownames = TRUE) 967.5651 1054.8
>     1155.453
>     >> 1149.767 1194.742 1451.14    10
>     >>
>     >>
>     >> Version 4.1.3
>     >> ===============
>     >>
>     >> Run 1:
>     >> ------
>     >>
>     >> Unit: milliseconds
>     >>                                           expr min lq     mean
>     >>   median       uq      max neval
>     >>   df <- as.data.frame(m, keep.rownames = TRUE) 274.5478 298.2477
>     >> 320.3988
>     >> 320.9164 342.8119 375.6841    10
>     >>
>     >> Run 2:
>     >> -------
>     >> Unit: milliseconds
>     >>                                           expr min lq     mean
>     >>   median       uq      max neval
>     >>   df <- as.data.frame(m, keep.rownames = TRUE) 278.5369 310.0312
>     >> 313.0745
>     >> 313.3275 320.0294 343.7539    10
>     >>
>     >> I have tried it on two different machines, with the same result.
>     >>
>     >> -----
>     >>
>     >> The above example is just trying to do something simple that
>     exposes the
>     >> issue, but as.data.table behaves similarly. Also it shows huge
>     >> variations
>     >> in time. We had a script that ran in 12 minutes in v3.6.3 and
>     it took 18
>     >> min with v4.2.3, with v4.1.3 it takes around 9 minutes.
>     >>
>     >> Has anyone else noticed this? I noticed in the release notes that
>     >> Doug Leas
>     >> malloc was replaced in v4.2.0 and that's a windows only change.
>     >
>     > Thanks for the report. I confirm the slowdown with this example
>     and I
>     > confirm it is due to the change in memory allocator: I've
>     switched my
>     > working copy of R-devel back to the original version of dlmalloc,
>     > which removed the slowdown.
>     >
>     > Windows 10 (build 19041 and later) allows to choose a more recent
>     > SegmentHeap allocator instead of the default Low Fragmentation Heap
>     > allocator. It gives almost the same performance with this
>     example as
>     > the original version of dlmalloc, without the maintenance
>     overhead of
>     > using a custom allocator, so this might be one possible solution.
>
>     Hi Fredrik,
>
>     we made R-devel use Segment Heap on recent Windows as an experiment.
>     Could you please check the performance implications on some real
>     application, on which you based the example micro-benchmark? Did it
>     improve performance for you?
>
>     Indeed, if you have access to some other memory intensive real
>     applications with real data, it would be useful to check using
>     that as
>     well.
>
>     Microbenchmarks are tricky. While yours works much better with
>     Segment
>     Heap, my colleague found another one which works much better with Low
>     Fragmentation Heap.
>
>     Thanks
>     Tomas
>
>     >
>     > Best
>     > Tomas
>     >
>     >>
>     >> Best regards,
>     >>
>     >> Fredrik
>     >>
>     >>     [[alternative HTML version deleted]]
>     >>
>     >> _______________________________________________
>     >> R-SIG-windows mailing list
>     >> R-SIG-windows using r-project.org
>     >> https://stat.ethz.ch/mailman/listinfo/r-sig-windows
>
	[[alternative HTML version deleted]]



More information about the R-SIG-windows mailing list