[Rd] configure can't find dgemm in MKL10
Christopher Paciorek
paciorek at hsph.harvard.edu
Tue Apr 22 02:23:21 CEST 2008
Just to follow up on my previous email with some results about this that may be helpful as a guide for others... this was on a Linux box, Intel processor, R 2.6.2.
Following Prof. Ripley's suggestion, we went the shared BLAS route and were able to get this working using Goto BLAS.
The combination of using Intel compilers in place of gnu and using Goto BLAS in place of internal R BLAS gave an order of magnitude speedup in basic linear algebra routines.
Here's an example of some comparative timings:
Intel compilers, Goto BLAS:
mat=matrix(rnorm(2000*2000),2000,2000)
> system.time(mat%*%mat)
user system elapsed
2.203 0.061 2.270
> cov=t(mat)%*%mat
> system.time(chol(cov))
user system elapsed
0.442 0.046 0.496
Gnu compilers, internal R BLAS:
mat=matrix(rnorm(2000*2000),2000,2000)
> system.time(mat%*%mat)
user system elapsed
46.695 0.058 46.793
> cov=t(mat)%*%mat
> system.time(chol(cov))
user system elapsed
4.871 0.026 4.902
Didn't go back and check, but I believe these timings are about equivalent in speed to just using the R2.6.2 Mac OS X image downloaded directly from CRAN to my MacBook, dual-core Intel.
-chris
>>> Prof Brian Ripley <ripley at stats.ox.ac.uk> 04/18/08 9:45 AM >>>
Did you see
See 'Shared BLAS' for an alternative (and in many ways preferable)
way to use MKL.
? That's an easier route to get working, and you can swap BLASes almost
instantly.
But you need to look at the config.log to see what went wrong.
'Xeon' covers a multitude of processors, but my group's experience is that
for recent Intel CPUs the Goto BLAS beats all others (including MKL and
ATLAS). As you are in academia it is available to you, and it too is easy
to swap in.
On Fri, 18 Apr 2008, Christopher Paciorek wrote:
> Hi,
> I'm trying to follow the R- admin instructions for using MKL10 as the external BLAS compiling R- 2.6.2 under Linux on a RH EL head node of a cluster. The configure process seems to have problems when it checks for dgemm in the BLAS. I'm using configure as:
> ./configure CC=icc F77=ifort -- with- lapack="$MKL" -- with- blas="$MKL" where $MKL is defined as in R- admin section A.3.1.4.
>
> checking for cblas_cdotu_sub in vecLib framework... no
> checking for dgemm_ in - L/usr1/util/Intel/mkl/10.0.1.014/lib/em64t - Wl,-- start- group /usr1/util/Intel/mkl/10.0.1.014/lib/em64t/libmkl_gf_lp64.a /usr1/util/Intel/mkl/10.0.1.014/lib/em64t/libmkl_gnu_thread.a /usr1/util/Intel/mkl/10.0.1.014/lib/em64t/libmkl_core.a - Wl,-- end- group - liomp5 - lguide - lpthread - lgomp... no
> checking for dgemm_... no
> checking for ATL_xerbla in - latlas... yes
> checking for dgemm_ in - lf77blas... no
> checking for dgemm_ in - lblas... yes
> checking for dgemm_ in - ldgemm... no
> checking for dgemm_ in - lblas... (cached) yes
> checking for dgemm_ in - lessl... no
> checking for dgemm_ in - lblas... (cached) yes
>
> I've looked in the MKL .a files and do not actually see dgemm or dgemm_ explicitly. So this seemingly explains the result of configure which is that BLAS_LIBS is not set to point to MKL but defaults to pointing to the usual Rblas (based on BLAS_LIBS in Makeconf (BLAS_LIBS = - L$(R_HOME)/lib$(R_ARCH) - lRblas) and the absence of a mention of BLAS in the 'External libraries' line at the end of the configure process output.
>
> In looking for dgemm in the MKL .a files (ar t libName.a | grep dgemm),
> libmkl_gf_lp64.a lists cblas_dgemm_lp64.o,_dgemm_lp64.o
> libmkl_core.a lists a bunch of things with dgemm in the name but not dgemm itself, e.g., _dgemm_kernel_0_fb.o,_mc_dgemm_bufs_0.o
> libmkl_gnu_thread.a lists dgemm_omp.o
>
> Incidentally _dgemm.o is listed in libmkl_gf_ilp64.a.
>
> We're running Red Hat Enterprise Linux AS release 4 (Nahant Update 5) on
> an Intel Xeon head node of a cluster.
>
> Incidentally, this has come about because in playing with my new $1300
> Macbook, I found it was doing basic matrix work (dense matrix
> multiplication, Cholesky) about 5x as fast as our Linux cluster. I
> haven't looked into it much, but given that CPU use is listed as nearing
> 200% on the dual core Mac, part of this may be due to the Mac taking
> advantage of both cores. My hope is that with a faster BLAS the
> difference between the Mac and our cluster for basic linear algebra will
> lessen or disappear.
>
> Any tips on what may be going wrong in the configure test process or how
> to get around this would be helpful.
>
> Thanks,
> Chris
>
> ----------------------------------------------------------------------------------------------
> Chris Paciorek / Asst. Professor Email: paciorek at hsph.harvard.edu
> Department of Biostatistics Voice: 617- 432- 4912
> Harvard School of Public Health Fax: 617- 432- 5619
> 655 Huntington Av., Bldg. 2- 407 WWW: www.biostat.harvard.edu/~paciorek
> Boston, MA 02115 USA Permanent forward: paciorek at alumni.cmu.edu
>
> ______________________________________________
> R- devel at r- project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r- devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list