[Rd] Problem with R >3.0.0
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Aug 21 21:26:00 CEST 2013
On 21/08/2013 15:00, Prof Brian Ripley wrote:
> On 21/08/2013 13:45, peter dalgaard wrote:
>>
>> On Aug 20, 2013, at 19:42 , Shelton, Samuel wrote:
>>
>>> Hi all,
>>>
>>> Thanks for getting back to me. We would like to move over to v3.0.0
>>> on our
>>> cluster so that we can build matrices larger than 46300*46300 (limit
>>> in R
>>> <3.0.0)
>>> but so far we can't get things to work with R v3.0.0 and higher. I am
>>> trying to trouble shoot at the moment and I am now thinking that the
>>> problem is actually with the diag function that has been rewritten in
>>> version 3.0.0.
>>>
>>>
>>> The problem is definitely with the diag function and it does not
>>> occur on
>>> smaller matrices (20000*20000) and I think it maybe a bug.
>>> This illustrates the problem:
>>>
>>> This was done on an iMac i5 with OSX 10.8.5 16GB Ram and with R 3.0.1
>>> (but
>>> I do see the same for 3.0.0). This does not occur when I run it with R
>>> 2.15.2.
>>>
>>
>>
>> Thanks. I can condense this to
>>
>>> M <- matrix(1,23170,23170) ; diag(M) <- 0 ; range(colSums(M))
>> [1] 23169 23169
>>> M <- matrix(1,23171,23171) ; diag(M) <- 0 ; range(colSums(M))
>> [1] 0 23170
>
> A much faster check is to look at M[1:3, 1:3]
>
>> and the fact that 2^14.5 is 23170.48 is not likely to be a coincidence...
>>
>> It is only happening with some of my builds, though. In particular, my
>> MacPorts build of 3.0.1 does not have the problem on Snow Leopard, nor
>> does the CRAN build of 3.0.0, still on Snow Leopard. It takes forever
>> to check on a 4GB machine....
>
> Note that does not use the diag() function but diag<-(), which is
> essentially unchanged since 2.15.x (the error detection was moved above
> an expensive calculation).
>
> It works correctly on x86_64 Linux and Solaris. I suspect a
> platform-specific issue in
>
> x[cbind(i, i)] <- value
>
>
I have tracked this down to an issue with memcpy on vectors of 2^32 or
more bytes. That very likely explains why it appears in some OS X
builds and not others (depending on the compiler and libc used), and not
on other platforms.
I am looking into a workaround that only uses smaller sections for
memcpy without losing all the performance gains.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list