[Rd] Notes on building a gcc toolchain for Rtools (but not multilib)
Avraham Adler
avraham.adler at gmail.com
Tue Mar 10 08:03:20 CET 2015
> On Mon, Mar 9, 2015 at 10:40 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
> It's now on the main site at CRAN, and should propagate to the mirrors
> reasonably quickly. I'm hoping that tomorrow's R-devel build will use it,
> but there may be some last minute problems.
Using Rtools 3.3, once it propagated through the cran servers, I have
successfully built a 64-bit version of R on Windows 7, up through make
rinstaller. This one includes using ICU_531, and also includes linking
to 64-bit OpenBLAS 2.13 (4 threads).
As with yesterday's build using 4.9.2.-seh (although that one left ICU
out) the only issue that seems to have failed in make check-all is the
internet connectivity, which is disabled by default. Loading R and
passing setinternet2() fixes that, and I plan on using the options
built into the installer I create to have that set at install (like
SDI). Is it at all possible to have that setting exposed in
Mkrules.dist so as to be set at compile?
I also built microbenchmark, which requires packages ‘colorspace’,
‘Rcpp’, ‘stringr’, ‘RColorBrewer’, ‘dichromat’, ‘munsell’, ‘labeling’,
‘plyr’, ‘digest’, ‘gtable’, ‘reshape2’, ‘scales’, ‘proto’, and
‘ggplot2’, and they all worked fine. For what it is worth, I forgot to
uncomment (unhash) Hsiu-Khuern's addition to the NM filter, yet Rcpp
built fine and compiled C++ code fine as well, although about 3%-5%
slower than what I recall from last night's seh version.
So, outside of this hiccup with somehow now needing internet2 (which
may have to do with microsoft Windows patches for all I know) which
cannot be set at default, it seems as if the toolchain is behaving
well! I have not tried building with curl, though; that looks a bit
more hairy, although it may address the internet2 issue, who knows.
For interest sakes, below is a comparison of speed across various
versions/compilers which may prove of interest. The takeaway for me is
that for matrix code a fast BLAS is significantly more important than
which version of GCC and exception handling is used. For non-BLAS
specific code, at least on my machine, the SJLJ performed about 1%–2%
*faster*. Go figure! Maybe someone will run Simon Urbanek's benchmark
against them.
Regardless, I'm much less apprehensive about 3.2's release in April.
Thank you, Duncan and all!
Avi
== Speed results compiled over a few months (except for the last two) ==
For the record, all code run on an Intel i7-2600K overclocked to
4.6Ghz, 16GB RAM, Windows 7 64bit Matrices A and B are 1000x1000 dense
matrices, of which A is positive semi-definite and B is not. I use
this to test BLAS builds. I hope that the fixed width works in plain
text model.
=== Non-BLAS dependent ===
#Test code
library(microbenchmark)
A <- as.matrix(read.csv(file="F:/R/A.csv", colClasses='numeric'))
B <- as.matrix(read.csv(file="F:/R/B.csv", colClasses='numeric'))
colnames(A) <- colnames(B) <- NULL
Z <- microbenchmark(A + 2, A - 2, A * 2, A / 2, A + B, A - B, A * B, A
/ B, A ^ 2, sqrt(A), control=list(order = 'block'), times = 1000L)
R-devel_2015-03-08 compiled using
x86_64-4.9.2-release-win32-seh-rt_v3-rev1 (EOPTS = -O3 -march=native
-mfpmath=sse -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)
OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)
Unit: microseconds
expr min lq mean median uq max neval
A + 2 923.001 1844.215 2205.385 1858.957 1990.900 21714.18 1000
A - 2 1742.652 1830.215 2196.901 1844.810 2507.798 21778.22 1000
A * 2 1743.247 1843.023 2208.374 1860.298 2547.112 21776.43 1000
A/2 2025.598 2111.375 2438.503 2122.097 2701.243 22034.06 1000
A + B 2016.662 2124.182 2554.006 2143.690 2948.896 21964.07 1000
A - B 2004.153 2103.930 2527.219 2128.203 2982.552 22295.27 1000
A * B 2023.215 2119.715 2540.680 2141.010 3154.553 22074.27 1000
A/B 3256.265 3354.700 3633.556 3368.252 3953.950 23189.67 1000
A^2 1745.332 1835.279 2204.023 1850.469 2554.856 21869.66 1000
sqrt(A) 49945.064 50066.434 50506.344 50187.356 50883.403 70006.25 1000
R-devel_2015-03-09 compiled using Rtools 3.3 (GCC 4.9.2, SJLJ, EOPTS =
-O3 -march=native -mfpmath=sse -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)
OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)
Unit: microseconds
expr min lq mean median uq max neval
A + 2 925.980 1777.350 2167.326 1791.795 2384.641 21660.28 1000
A - 2 1673.256 1777.648 2188.756 1806.687 2670.715 21724.01 1000
A * 2 1680.999 1786.434 2221.432 1835.130 2766.916 22254.16 1000
A/2 1992.836 2085.165 2450.455 2108.694 2865.203 22803.08 1000
A + B 1977.646 2089.632 2559.912 2121.204 3031.397 22884.99 1000
A - B 1979.135 2081.591 2516.943 2101.398 3003.548 22377.77 1000
A * B 1971.689 2073.699 2510.912 2092.462 2921.345 22308.37 1000
A/B 3247.031 3345.169 3633.351 3361.402 3941.590 23231.97 1000
A^2 1668.788 1771.244 2169.422 1788.220 2745.026 21786.86 1000
sqrt(A) 48662.871 48805.537 49357.270 49003.003 49715.283 69269.10 1000
=== BLAS dependent code (statistics gathered over a few months ===
#Test code
library(microbenchmark)
library(Matrix)
A <- as.matrix(read.csv(file="F:/R/A.csv", colClasses='numeric'))
B <- as.matrix(read.csv(file="F:/R/B.csv", colClasses='numeric'))
colnames(A) <- colnames(B) <- NULL
Z <- microbenchmark(
sort(A),
t(A) %*% B,
crossprod(A, B),
solve(A),
solve(A, diag(A)),
chol(A),
chol(B, pivot = TRUE),
qr(A, LAPACK=TRUE),
svd(A),
eigen(A, symmetric = TRUE),
eigen(A, symmetric = FALSE),
eigen(B, symmetric = FALSE),
lu(A),
fft(A),
times=100L, unit='ms', control = list(order = 'block'))
REFERENCE 3.1.1 compiled using Rtools 3.1 (GCC 4.6.3, default EOPTS flags)
reference BLAS
Unit: milliseconds
expr min lq mean
median uq max neval
sort(A) 89.364120 90.760662 95.096270
91.561537 92.573725 154.081306 100
t(A) %*% B 463.145756 470.406496 487.680120
474.872066 490.043866 642.640917 100
crossprod(A, B) 727.114903 729.128111 730.031458
729.785877 731.120320 733.078130 100
solve(A) 600.629979 604.814394 630.598703
608.606561 658.326032 662.879314 100
solve(A, diag(A)) 145.738089 146.774104 147.629655
147.959780 148.371535 148.883512 100
chol(A) 115.873110 116.019644 117.347118
116.212938 118.026150 172.853468 100
chol(B, pivot = TRUE) 2.415134 2.548564 3.227905
2.559286 4.568473 4.689393 100
qr(A, LAPACK = TRUE) 414.455301 416.033671 418.583569
416.972741 417.814271 473.541941 100
svd(A) 1952.765952 1957.070246 1974.547371
1959.374735 2010.263499 2017.405106 100
eigen(A, symmetric = TRUE) 917.120317 920.482414 923.423802
921.784990 924.577926 980.692929 100
eigen(A, symmetric = FALSE) 2981.049436 2985.640691 3007.526012
2991.149276 3014.926832 3130.924137 100
eigen(B, symmetric = FALSE) 3964.874086 3974.978839 3999.080880
3991.973829 4019.799690 4078.083071 100
lu(A) 137.437464 138.229850 141.696849
138.906528 142.217546 198.202991 100
fft(A) 109.981065 110.321042 111.753592
110.640916 111.268152 116.670410 100
3.1.2 compiled using Rtools 3.2 (GCC 4.6.3, EOPTS = -march=native -O3
-std=gnu++0x -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper --param
l1-cache-line-size=64 --param l1-cache-size=64 --param
l2-cache-size=256)
OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)
Unit: milliseconds
expr min lq mean
median uq max neval
sort(A) 88.771066 89.748265 94.542642
90.596947 91.482709 149.171214 100
t(A) %*% B 27.507195 33.359067 40.378088
37.689446 41.512909 96.868916 100
crossprod(A, B) 17.783759 22.327538 26.787467
27.059399 31.918288 36.209055 100
solve(A) 45.964657 54.856090 80.761447
60.499775 109.150759 118.817308 100
solve(A, diag(A)) 24.704266 26.370058 26.805694
26.936840 27.400868 29.522052 100
chol(A) 6.762058 7.088337 8.725137
8.145653 8.973040 65.570275 100
chol(B, pivot = TRUE) 2.558110 2.702412 3.481314
2.831076 4.789643 5.346446 100
qr(A, LAPACK = TRUE) 78.757538 81.620631 85.132413
82.940043 85.099350 141.434937 100
svd(A) 361.539846 366.637747 386.533779
370.769323 421.736275 445.087770 100
eigen(A, symmetric = TRUE) 174.249560 180.402841 186.649060
182.628715 188.931063 241.414148 100
eigen(A, symmetric = FALSE) 734.881721 744.303748 772.203936
751.104077 795.883051 915.351575 100
eigen(B, symmetric = FALSE) 2522.750166 2551.112148 2596.798329
2581.940655 2633.440287 2861.722717 100
lu(A) 20.277535 21.227185 25.068971
23.319926 25.130468 84.837552 100
fft(A) 109.757747 110.347313 112.123488
110.725415 114.057152 120.250492 100
R-devel_2015-03-09 compiled using Rtools 3.3 (GCC 4.9.2, SJLJ, EOPTS =
-O3 -march=native -mfpmath=sse -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)
OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)
Unit: milliseconds
expr min lq mean
median uq max neval
sort(A) 88.025153 88.255828 92.701967
89.571826 90.320888 146.40380 100
t(A) %*% B 26.471552 30.866301 35.293662
34.069253 38.490212 85.57007 100
crossprod(A, B) 17.606699 17.898879 23.999433
22.228699 28.620007 37.06744 100
solve(A) 43.410199 48.448279 54.914690
51.338798 55.865639 116.81746 100
solve(A, diag(A)) 24.655633 25.414227 27.692980
27.301179 28.757458 38.95692 100
chol(A) 6.620942 6.891379 8.010618
7.474695 8.233586 12.62357 100
chol(B, pivot = TRUE) 2.456867 2.541751 3.737836
2.575556 2.722390 61.46246 100
qr(A, LAPACK = TRUE) 78.153905 80.980389 83.663278
82.458112 84.998671 101.89696 100
svd(A) 353.204099 365.191932 390.446252
377.001957 417.792818 475.73975 100
eigen(A, symmetric = TRUE) 173.627391 177.985954 186.068097
182.131711 187.866286 251.19902 100
eigen(A, symmetric = FALSE) 771.643075 788.242038 813.902106
801.689427 839.380539 921.24119 100
eigen(B, symmetric = FALSE) 2591.501370 2644.449833 2691.339277
2678.241053 2722.924657 2935.76884 100
lu(A) 19.969747 20.959164 24.298874
22.426017 24.017664 81.95253 100
fft(A) 106.862816 107.191480 108.985064
107.466682 110.465762 115.73511 100
R-devel_2015-03-08 compiled using
x86_64-4.9.2-release-win32-seh-rt_v3-rev1 (EOPTS = -O3 -march=native
-mfpmath=sse -msse2avx -mavx256-split-unaligned-load
-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)
OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC
4.9.1 (MinGW-64)
Unit: milliseconds
expr min lq mean
median uq max neval
sort(A) 88.372432 88.811892 93.321491
90.093638 90.754540 150.02760 100
t(A) %*% B 26.583837 30.443074 34.765044
33.903505 37.455374 82.54761 100
crossprod(A, B) 17.715707 22.088566 26.875521
27.185023 31.154311 36.72850 100
solve(A) 44.112203 49.217298 55.707862
52.651668 57.331152 116.44069 100
solve(A, diag(A)) 24.891819 25.468731 27.590369
27.302520 29.217172 37.90168 100
chol(A) 6.658469 6.872168 7.893779
7.058167 8.968203 13.32230 100
chol(B, pivot = TRUE) 2.451208 2.529540 3.742339
2.578981 2.646143 62.62224 100
qr(A, LAPACK = TRUE) 78.839230 80.413602 82.989497
81.778148 84.447373 98.13199 100
svd(A) 352.931278 362.746235 387.952468
374.631166 415.481743 500.52405 100
eigen(A, symmetric = TRUE) 172.696946 178.109557 187.816872
181.375053 190.414291 256.44276 100
eigen(A, symmetric = FALSE) 778.904964 793.941318 820.598107
812.244809 841.944627 919.02527 100
eigen(B, symmetric = FALSE) 2494.645617 2514.200623 2562.484197
2561.112354 2586.092481 2806.00525 100
lu(A) 19.762154 20.663114 24.555941
22.403382 24.369411 80.98218 100
fft(A) 106.374956 107.120148 108.625520
107.433176 108.786850 116.43563 100
More information about the R-devel
mailing list