[R-SIG-Mac] How to determine if a Mac is Nehalem-based
simon.urbanek at r-project.org
Thu Oct 21 18:36:08 CEST 2010
On Oct 21, 2010, at 7:47 AM, Stefan Evert wrote:
> On 21 Oct 2010, at 03:28, Simon Urbanek wrote:
>> It's not vague at all, it's MacPro4,1 and MacPro5,1 models (you can use use "sysctl hw.model" to find out what you have). If in doubt, check on Wikipedia ;)
>> The latter uses the Nehalem architecture but I don't have a specimen of those so I can't confirm that the bug still holds true for those.
> Not just those ... I'm plagued by the same problem on my Penryn-based MacBookPro4,1. In 64-bit mode, BLAS performance breaks down to single core levels, whereas in 32-bit mode (i.e. R --arch=i386) it uses both cores. I posted some benchmark results to this list a few weeks ago.
Well, given that it is only a two-thread CPU there is not much you can gain so I wouldn't lose my sleep over it. If you have 16-theads CPU it's a while different story ;). For illustration, those are the timings from your benchmarks (only those that use BLAS) for 64-bit R 2.12.0 at 10.6.4 on a 2.66GHz MacPro4,1:
test R BLAS vecLib ATLAS MKL
inner M %*% t(M) D 19.961 3.470 0.519 0.662
inner tcrossprod D 0.658 1.867 0.243 0.235
inner crossprod t(M) D 9.574 1.849 0.242 0.256
cosine normalised D 0.798 2.009 0.385 0.411
cosine general D 0.770 1.993 0.380 0.352
euclid() D 2.072 3.271 1.637 1.635
euclid() small D 0.515 0.821 0.421 0.395
As you can see both MKL and ATLAS outperform vecLib and R BLAS by an order of magnitude. It's sad, because vecLib used to be fairly well optimized ... (in fact it is actually some version of ATLAS which is even more strange ...).
> My solution has also been to switch to the reference BLAS, which outperforms vecLib on most of the operations I benchmarked, except for crossprod(), which is terribly slow (more than 10x slower than tcrossprod()). I've just tested again with R 2.12.0, and the situation has become even worse: now an explicit matrix multiplication M %*% t(M) -- which used to be fast -- performs as poorly as crossprod().
> Any ideas about this? The crossprod() slowdown isn't a Mac problem: I got similar results on a Pentium Dual Core laptop running Ubuntu. If this is a known problem of the reference BLAS, is there any way to work around it?
> Apart from the speed hiccups, in my benchmarks vecLib BLAS performed consistently slower than the reference BLAS. Is there evidence from other benchmarks / hardware architectures that vecLib can be faster? If not, perhaps the default should be _not_ to use vecLib on Mac? Or perhaps it would be possible to autodetect hardware in the R startup wrapper and select the BLAS that's known to run faster on this setup?
I don't think we would want to do that since that would prevent the user from choosing the BLAS they want to use. We will probably abandon vecLib as the default for the next release (more due to its numerical instability issues) and maybe provide all three options (vecLib, R BLAS, ATLAS) for the user to choose from in case they have a machine that can take advantage of it.
More information about the R-SIG-Mac