[Rd] Is it safe not to coerce matrices with as.double() in .C()?

Liaw, Andy andy_liaw at merck.com
Fri Sep 17 19:22:00 CEST 2010


From: Liaw, Andy
> 
> From: Prof Brian Ripley
> > 
> > On Fri, 27 Aug 2010, peter dalgaard wrote:
> > 
> > >
> > > On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote:
> > >
> > >> I'd very much appreciate guidance on this.  A user 
> > reported that the
> > >> as.double() coercion used inside the .C() call for a 
> function in my
> > >> package (specifically, randomForest:::predict.randomForest()) is
> > >> taking up significant amount of time when called repeatedly, and
> > >> Removing some of these reduced run time by 30-40% in some cases.
> > >> These arguments are components of the fitted model (thus do not
> > >> change), and are matrices.  Some basic tests show no 
> difference in
> > >> The result when the coercions are removed (other than 
> > faster run time).
> > >> What I like to know is whether this is safe to do, or is 
> > it likely to
> > >> lead
> > >> to trouble in the future?
> > >
> > > In a word: yes. It is safe as long as you are absolutely 
> sure that 
> > > the argument has the right mode. The unsafeness comes in 
> > when people 
> > > might unwittingly use, say, an integer vector where a double was 
> > > expected, causing memory overruns and general mayhem.
> > >
> > > Notice, BTW, that if you switch to .Call or .External, then 
> > you have 
> > > much more scope for handling such details on the C-side. E.g. you 
> > > could coerce only if the object has the wrong mode, avoid 
> > > duplicating things you won't be modifying anyway, etc.
> > 
> > But as as.double is effectively .Call it has the same 
> freedom, and it 
> > does nothing if no coercion is required.  The crunch here is 
> > likely to 
> > be
> > 
> >       ‘as.double’ attempts to coerce its argument to be of 
> > double type:
> >       like ‘as.vector’ it strips attributes including names.  
> > (To ensure
> >       that an object is of double type without stripping 
> > attributes, use
> >       ‘storage.mode’.)
> > 
> > I suspect the issue is the copying to remove attributes, in 
> which case
> 
> I can certainly believe this.  I've tried replacing 
> as.double() to c(), thinking attributes need to be stripped.  
> That actually increased run time very slightly instead of reducing it.
>  
> > storage.mode(x) <- "double"
> > 
> > should be a null op and so both fast and safe.
> 
> Will follow this advise.  Thanks to both of you for the help!

My apologies for coming back to this so late.  I did some test, and found that

  storage.mode(x) <- "double"

isn't as low on resource as I thought it might be.  Changing the code to this from

  x <- as.double(x)

did not give the expected speed improvement.  Here's a little test:

f1 <- function(x) { as.double(x); NULL }
f2 <- function(x) { storage.mode(x) <- "double"; NULL }
f3 <- function(x) { x <- x; NULL }
set.seed(917)
reps <- 500
x <- matrix(rnorm(1e6L), 1e3L, 1e3L)
system.time(junk <- replicate(reps, f1(x)))
system.time(junk <- replicate(reps, f2(x)))
system.time(junk <- replicate(reps, f3(x)))

On my laptop running R  2.11.1 Patched (2010-06-26 r52410), I get:

R> system.time(junk <- replicate(reps, f1(x)))
   user  system elapsed 
   3.54    2.14    5.74 
R> system.time(junk <- replicate(reps, f2(x)))
   user  system elapsed 
   3.32    2.11    5.92 
R> system.time(junk <- replicate(reps, f3(x)))
   user  system elapsed 
      0       0       0 

Perhaps I need to first check and see if the storage mode is as expected before trying coercion?

Best,
Andy


 
> Best,
> Andy
> 
>  
> > -- 
> > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford,             Tel:  +44 1865 272861 (self)
> > 1 South Parks Road,                     +44 1865 272866 (PA)
> > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> > 
> Notice:  This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
> New Jersey, USA 08889), and/or its affiliates Direct contact 
> information
> for affiliates is available at 
> http://www.merck.com/contact/contacts.html) that may be confidential,
> proprietary copyrighted and/or legally privileged. It is 
> intended solely
> for the use of the individual or entity named on this 
> message. If you are
> not the intended recipient, and have received this message in error,
> please notify us immediately by reply e-mail and then delete it from 
> your system.
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.


More information about the R-devel mailing list