[Rd] Notes on building a gcc toolchain for Rtools (but not multilib)

Avraham Adler avraham.adler at gmail.com
Tue Mar 10 08:03:20 CET 2015


> On Mon, Mar 9, 2015 at 10:40 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
> It's now on the main site at CRAN, and should propagate to the mirrors
> reasonably quickly.  I'm hoping that tomorrow's R-devel build will use it,
> but there may be some last minute problems.

Using Rtools 3.3, once it propagated through the cran servers, I have
successfully built a 64-bit version of R on Windows 7, up through make
rinstaller. This one includes using ICU_531, and also includes linking
to 64-bit OpenBLAS 2.13 (4 threads).

As with yesterday's build using 4.9.2.-seh (although that one left ICU
out) the only issue that seems to have failed in make check-all is the
internet connectivity, which is disabled by default. Loading R and
passing setinternet2() fixes that, and I plan on using the options
built into the installer I create to have that set at install (like
SDI). Is it at all possible to have that setting exposed in
Mkrules.dist so as to be set at compile?

I also built microbenchmark, which requires packages ‘colorspace’,
‘Rcpp’, ‘stringr’, ‘RColorBrewer’, ‘dichromat’, ‘munsell’, ‘labeling’,
‘plyr’, ‘digest’, ‘gtable’, ‘reshape2’, ‘scales’, ‘proto’, and
‘ggplot2’, and they all worked fine. For what it is worth, I forgot to
uncomment (unhash) Hsiu-Khuern's addition to the NM filter, yet Rcpp
built fine and compiled C++ code fine as well, although about 3%-5%
slower than what I recall from last night's seh version.

So, outside of this hiccup with somehow now needing internet2 (which
may have to do with microsoft Windows patches for all I know) which
cannot be set at default, it seems as if the toolchain is behaving
well! I have not tried building with curl, though; that looks a bit
more hairy, although it may address the internet2 issue, who knows.

For interest sakes, below is a comparison of speed across various
versions/compilers which may prove of interest. The takeaway for me is
that for matrix code a fast BLAS is significantly more important than
which version of GCC and exception handling is used. For non-BLAS
specific code, at least on my machine, the SJLJ performed about 1%–2%
*faster*. Go figure! Maybe someone will run Simon Urbanek's benchmark
against them.

Regardless, I'm much less apprehensive about 3.2's release in April.
Thank you, Duncan and all!


Avi



== Speed results compiled over a few months (except for the last two) ==

For the record, all code run on an Intel i7-2600K overclocked to
4.6Ghz, 16GB RAM, Windows 7 64bit Matrices A and B are 1000x1000 dense
matrices, of which A is positive semi-definite and B is not. I use
this to test BLAS builds. I hope that the fixed width works in plain
text model.

=== Non-BLAS dependent ===

#Test code
library(microbenchmark)
A <- as.matrix(read.csv(file="F:/R/A.csv", colClasses='numeric'))
B <- as.matrix(read.csv(file="F:/R/B.csv", colClasses='numeric'))
colnames(A) <- colnames(B) <- NULL
Z <- microbenchmark(A + 2, A - 2, A * 2, A / 2, A + B, A - B, A * B, A
/ B, A ^ 2, sqrt(A), control=list(order = 'block'), times = 1000L)


 R-devel_2015-03-08 compiled using
x86_64-4.9.2-release-win32-seh-rt_v3-rev1 (EOPTS = -O3 -march=native
-mfpmath=sse -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)
 OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)

 Unit: microseconds
    expr       min        lq      mean    median        uq      max neval
   A + 2   923.001  1844.215  2205.385  1858.957  1990.900 21714.18  1000
   A - 2  1742.652  1830.215  2196.901  1844.810  2507.798 21778.22  1000
   A * 2  1743.247  1843.023  2208.374  1860.298  2547.112 21776.43  1000
     A/2  2025.598  2111.375  2438.503  2122.097  2701.243 22034.06  1000
   A + B  2016.662  2124.182  2554.006  2143.690  2948.896 21964.07  1000
   A - B  2004.153  2103.930  2527.219  2128.203  2982.552 22295.27  1000
   A * B  2023.215  2119.715  2540.680  2141.010  3154.553 22074.27  1000
     A/B  3256.265  3354.700  3633.556  3368.252  3953.950 23189.67  1000
     A^2  1745.332  1835.279  2204.023  1850.469  2554.856 21869.66  1000
 sqrt(A) 49945.064 50066.434 50506.344 50187.356 50883.403 70006.25  1000


R-devel_2015-03-09 compiled using Rtools 3.3 (GCC 4.9.2, SJLJ, EOPTS =
-O3 -march=native -mfpmath=sse -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)
OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)

Unit: microseconds
    expr       min        lq      mean    median        uq      max neval
   A + 2   925.980  1777.350  2167.326  1791.795  2384.641 21660.28  1000
   A - 2  1673.256  1777.648  2188.756  1806.687  2670.715 21724.01  1000
   A * 2  1680.999  1786.434  2221.432  1835.130  2766.916 22254.16  1000
     A/2  1992.836  2085.165  2450.455  2108.694  2865.203 22803.08  1000
   A + B  1977.646  2089.632  2559.912  2121.204  3031.397 22884.99  1000
   A - B  1979.135  2081.591  2516.943  2101.398  3003.548 22377.77  1000
   A * B  1971.689  2073.699  2510.912  2092.462  2921.345 22308.37  1000
     A/B  3247.031  3345.169  3633.351  3361.402  3941.590 23231.97  1000
     A^2  1668.788  1771.244  2169.422  1788.220  2745.026 21786.86  1000
 sqrt(A) 48662.871 48805.537 49357.270 49003.003 49715.283 69269.10  1000


=== BLAS dependent code (statistics gathered over a few months ===

#Test code
library(microbenchmark)
library(Matrix)
A <- as.matrix(read.csv(file="F:/R/A.csv", colClasses='numeric'))
B <- as.matrix(read.csv(file="F:/R/B.csv", colClasses='numeric'))
colnames(A) <- colnames(B) <- NULL

Z <- microbenchmark(
    sort(A),
    t(A) %*% B,
    crossprod(A, B),
    solve(A),
    solve(A, diag(A)),
    chol(A),
    chol(B, pivot = TRUE),
    qr(A, LAPACK=TRUE),
    svd(A),
    eigen(A, symmetric = TRUE),
    eigen(A, symmetric = FALSE),
    eigen(B, symmetric = FALSE),
    lu(A),
    fft(A),
    times=100L, unit='ms', control = list(order = 'block'))


REFERENCE 3.1.1 compiled using Rtools 3.1 (GCC 4.6.3, default EOPTS flags)
reference BLAS

Unit: milliseconds
                        expr         min          lq        mean
median          uq         max neval
                     sort(A)   89.364120   90.760662   95.096270
91.561537   92.573725  154.081306   100
                  t(A) %*% B  463.145756  470.406496  487.680120
474.872066  490.043866  642.640917   100
             crossprod(A, B)  727.114903  729.128111  730.031458
729.785877  731.120320  733.078130   100
                    solve(A)  600.629979  604.814394  630.598703
608.606561  658.326032  662.879314   100
           solve(A, diag(A))  145.738089  146.774104  147.629655
147.959780  148.371535  148.883512   100
                     chol(A)  115.873110  116.019644  117.347118
116.212938  118.026150  172.853468   100
       chol(B, pivot = TRUE)    2.415134    2.548564    3.227905
2.559286    4.568473    4.689393   100
        qr(A, LAPACK = TRUE)  414.455301  416.033671  418.583569
416.972741  417.814271  473.541941   100
                      svd(A) 1952.765952 1957.070246 1974.547371
1959.374735 2010.263499 2017.405106   100
  eigen(A, symmetric = TRUE)  917.120317  920.482414  923.423802
921.784990  924.577926  980.692929   100
 eigen(A, symmetric = FALSE) 2981.049436 2985.640691 3007.526012
2991.149276 3014.926832 3130.924137   100
 eigen(B, symmetric = FALSE) 3964.874086 3974.978839 3999.080880
3991.973829 4019.799690 4078.083071   100
                       lu(A)  137.437464  138.229850  141.696849
138.906528  142.217546  198.202991   100
                      fft(A)  109.981065  110.321042  111.753592
110.640916  111.268152  116.670410   100


3.1.2 compiled using Rtools 3.2 (GCC 4.6.3, EOPTS = -march=native -O3
-std=gnu++0x -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper --param
l1-cache-line-size=64 --param l1-cache-size=64 --param
l2-cache-size=256)
OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)

Unit: milliseconds
                        expr         min          lq        mean
median          uq         max neval
                     sort(A)   88.771066   89.748265   94.542642
90.596947   91.482709  149.171214   100
                  t(A) %*% B   27.507195   33.359067   40.378088
37.689446   41.512909   96.868916   100
             crossprod(A, B)   17.783759   22.327538   26.787467
27.059399   31.918288   36.209055   100
                    solve(A)   45.964657   54.856090   80.761447
60.499775  109.150759  118.817308   100
           solve(A, diag(A))   24.704266   26.370058   26.805694
26.936840   27.400868   29.522052   100
                     chol(A)    6.762058    7.088337    8.725137
8.145653    8.973040   65.570275   100
       chol(B, pivot = TRUE)    2.558110    2.702412    3.481314
2.831076    4.789643    5.346446   100
        qr(A, LAPACK = TRUE)   78.757538   81.620631   85.132413
82.940043   85.099350  141.434937   100
                      svd(A)  361.539846  366.637747  386.533779
370.769323  421.736275  445.087770   100
  eigen(A, symmetric = TRUE)  174.249560  180.402841  186.649060
182.628715  188.931063  241.414148   100
 eigen(A, symmetric = FALSE)  734.881721  744.303748  772.203936
751.104077  795.883051  915.351575   100
 eigen(B, symmetric = FALSE) 2522.750166 2551.112148 2596.798329
2581.940655 2633.440287 2861.722717   100
                       lu(A)   20.277535   21.227185   25.068971
23.319926   25.130468   84.837552   100
                      fft(A)  109.757747  110.347313  112.123488
110.725415  114.057152  120.250492   100


R-devel_2015-03-09 compiled using Rtools 3.3 (GCC 4.9.2, SJLJ, EOPTS =
-O3 -march=native -mfpmath=sse -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)
OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)

Unit: milliseconds
                        expr         min          lq        mean
median          uq        max neval
                     sort(A)   88.025153   88.255828   92.701967
89.571826   90.320888  146.40380   100
                  t(A) %*% B   26.471552   30.866301   35.293662
34.069253   38.490212   85.57007   100
             crossprod(A, B)   17.606699   17.898879   23.999433
22.228699   28.620007   37.06744   100
                    solve(A)   43.410199   48.448279   54.914690
51.338798   55.865639  116.81746   100
           solve(A, diag(A))   24.655633   25.414227   27.692980
27.301179   28.757458   38.95692   100
                     chol(A)    6.620942    6.891379    8.010618
7.474695    8.233586   12.62357   100
       chol(B, pivot = TRUE)    2.456867    2.541751    3.737836
2.575556    2.722390   61.46246   100
        qr(A, LAPACK = TRUE)   78.153905   80.980389   83.663278
82.458112   84.998671  101.89696   100
                      svd(A)  353.204099  365.191932  390.446252
377.001957  417.792818  475.73975   100
  eigen(A, symmetric = TRUE)  173.627391  177.985954  186.068097
182.131711  187.866286  251.19902   100
 eigen(A, symmetric = FALSE)  771.643075  788.242038  813.902106
801.689427  839.380539  921.24119   100
 eigen(B, symmetric = FALSE) 2591.501370 2644.449833 2691.339277
2678.241053 2722.924657 2935.76884   100
                       lu(A)   19.969747   20.959164   24.298874
22.426017   24.017664   81.95253   100
                      fft(A)  106.862816  107.191480  108.985064
107.466682  110.465762  115.73511   100


 R-devel_2015-03-08 compiled using
x86_64-4.9.2-release-win32-seh-rt_v3-rev1 (EOPTS = -O3 -march=native
-mfpmath=sse -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)
 OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)

 Unit: milliseconds
                        expr         min          lq        mean
median          uq        max neval
                     sort(A)   88.372432   88.811892   93.321491
90.093638   90.754540  150.02760   100
                  t(A) %*% B   26.583837   30.443074   34.765044
33.903505   37.455374   82.54761   100
             crossprod(A, B)   17.715707   22.088566   26.875521
27.185023   31.154311   36.72850   100
                    solve(A)   44.112203   49.217298   55.707862
52.651668   57.331152  116.44069   100
           solve(A, diag(A))   24.891819   25.468731   27.590369
27.302520   29.217172   37.90168   100
                     chol(A)    6.658469    6.872168    7.893779
7.058167    8.968203   13.32230   100
       chol(B, pivot = TRUE)    2.451208    2.529540    3.742339
2.578981    2.646143   62.62224   100
        qr(A, LAPACK = TRUE)   78.839230   80.413602   82.989497
81.778148   84.447373   98.13199   100
                      svd(A)  352.931278  362.746235  387.952468
374.631166  415.481743  500.52405   100
  eigen(A, symmetric = TRUE)  172.696946  178.109557  187.816872
181.375053  190.414291  256.44276   100
 eigen(A, symmetric = FALSE)  778.904964  793.941318  820.598107
812.244809  841.944627  919.02527   100
 eigen(B, symmetric = FALSE) 2494.645617 2514.200623 2562.484197
2561.112354 2586.092481 2806.00525   100
                       lu(A)   19.762154   20.663114   24.555941
22.403382   24.369411   80.98218   100
                      fft(A)  106.374956  107.120148  108.625520
107.433176  108.786850  116.43563   100



More information about the R-devel mailing list