[Rd] Is it safe not to coerce matrices with as.double() in .C()?

Liaw, Andy andy_liaw at merck.com
Fri Sep 17 22:16:11 CEST 2010


 
> From: Simon Urbanek 
> 
> On Sep 17, 2010, at 1:22 PM, Liaw, Andy wrote:
> 
> > From: Liaw, Andy
> >> 
> >> From: Prof Brian Ripley
> >>> 
> >>> On Fri, 27 Aug 2010, peter dalgaard wrote:
> >>> 
> >>>> 
> >>>> On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote:
> >>>> 
> >>>>> I'd very much appreciate guidance on this.  A user 
> >>> reported that the
> >>>>> as.double() coercion used inside the .C() call for a 
> >> function in my
> >>>>> package (specifically, randomForest:::predict.randomForest()) is
> >>>>> taking up significant amount of time when called repeatedly, and
> >>>>> Removing some of these reduced run time by 30-40% in some cases.
> >>>>> These arguments are components of the fitted model (thus do not
> >>>>> change), and are matrices.  Some basic tests show no 
> >> difference in
> >>>>> The result when the coercions are removed (other than 
> >>> faster run time).
> >>>>> What I like to know is whether this is safe to do, or is 
> >>> it likely to
> >>>>> lead
> >>>>> to trouble in the future?
> >>>> 
> >>>> In a word: yes. It is safe as long as you are absolutely 
> >> sure that 
> >>>> the argument has the right mode. The unsafeness comes in 
> >>> when people 
> >>>> might unwittingly use, say, an integer vector where a double was 
> >>>> expected, causing memory overruns and general mayhem.
> >>>> 
> >>>> Notice, BTW, that if you switch to .Call or .External, then 
> >>> you have 
> >>>> much more scope for handling such details on the C-side. 
> E.g. you 
> >>>> could coerce only if the object has the wrong mode, avoid 
> >>>> duplicating things you won't be modifying anyway, etc.
> >>> 
> >>> But as as.double is effectively .Call it has the same 
> >> freedom, and it 
> >>> does nothing if no coercion is required.  The crunch here is 
> >>> likely to 
> >>> be
> >>> 
> >>>      'as.double' attempts to coerce its argument to be of 
> >>> double type:
> >>>      like 'as.vector' it strips attributes including names.  
> >>> (To ensure
> >>>      that an object is of double type without stripping 
> >>> attributes, use
> >>>      'storage.mode'.)
> >>> 
> >>> I suspect the issue is the copying to remove attributes, in 
> >> which case
> >> 
> >> I can certainly believe this.  I've tried replacing 
> >> as.double() to c(), thinking attributes need to be stripped.  
> >> That actually increased run time very slightly instead of 
> reducing it.
> >> 
> >>> storage.mode(x) <- "double"
> >>> 
> >>> should be a null op and so both fast and safe.
> >> 
> >> Will follow this advise.  Thanks to both of you for the help!
> > 
> > My apologies for coming back to this so late.  I did some 
> test, and found that
> > 
> >  storage.mode(x) <- "double"
> > 
> > isn't as low on resource as I thought it might be.  
> Changing the code to this from
> > 
> >  x <- as.double(x)
> > 
> > did not give the expected speed improvement.  Here's a little test:
> > 
> > f1 <- function(x) { as.double(x); NULL }
> > f2 <- function(x) { storage.mode(x) <- "double"; NULL }
> > f3 <- function(x) { x <- x; NULL }
> > set.seed(917)
> > reps <- 500
> > x <- matrix(rnorm(1e6L), 1e3L, 1e3L)
> > system.time(junk <- replicate(reps, f1(x)))
> > system.time(junk <- replicate(reps, f2(x)))
> > system.time(junk <- replicate(reps, f3(x)))
> > 
> > On my laptop running R  2.11.1 Patched (2010-06-26 r52410), I get:
> > 
> > R> system.time(junk <- replicate(reps, f1(x)))
> >   user  system elapsed 
> >   3.54    2.14    5.74 
> > R> system.time(junk <- replicate(reps, f2(x)))
> >   user  system elapsed 
> >   3.32    2.11    5.92 
> > R> system.time(junk <- replicate(reps, f3(x)))
> >   user  system elapsed 
> >      0       0       0 
> > 
> > Perhaps I need to first check and see if the storage mode 
> is as expected before trying coercion?
> > 
> 
> Well, the devil is in the details. Although storage.mode<- is 
> a noop itself, the issue is that it does trigger duplication 
> because it is an assignment, not because storage mode would 
> change anything. Technically, x <- x is a special case which 
> is truly a noop whereas any call `foo<-` has to assume 
> modification. So, yes, in your case 
> f4 <- function(x) { if (storage.mode(x) != "double") 
> storage.mode(x) <- "double"; NULL }
> will have the same speed as f3. If you are going in to .Call 
> then you could as well do that in the C side (with the 
> benefit of being able to strip attributes since you can get 
> them from the original object if you care...).
> 
> Cheers,
> Simon

Thanks a lot, Simon, for the clarification.  Unfortunately I'm not using
.Call(), but .C() with DUP=FALSE, and it's exactly the duplication that
I'm trying to avoid.  For now I just inserted tests (is.double() and
is.integer()) and only do the coercion if needed, prior to the .C()
call.  That gives the speed up that I was expecting.

To do this more cleanly, I really need to learn .Call()...

Best,
Andy

> 
> > Best,
> > Andy
> > 
> > 
> > 
> >> Best,
> >> Andy
> >> 
> >> 
> >>> -- 
> >>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> >>> Professor of Applied Statistics,  
> http://www.stats.ox.ac.uk/~ripley/
> >>> University of Oxford,             Tel:  +44 1865 272861 (self)
> >>> 1 South Parks Road,                     +44 1865 272866 (PA)
> >>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> >>> 
> >> Notice:  This e-mail message, together with any 
> attachments, contains
> >> information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station,
> >> New Jersey, USA 08889), and/or its affiliates Direct contact 
> >> information
> >> for affiliates is available at 
> >> http://www.merck.com/contact/contacts.html) that may be 
> confidential,
> >> proprietary copyrighted and/or legally privileged. It is 
> >> intended solely
> >> for the use of the individual or entity named on this 
> >> message. If you are
> >> not the intended recipient, and have received this message 
> in error,
> >> please notify us immediately by reply e-mail and then 
> delete it from 
> >> your system.
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> 
> > Notice:  This e-mail message, together with any 
> attachments, contains
> > information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station,
> > New Jersey, USA 08889), and/or its affiliates Direct 
> contact information
> > for affiliates is available at 
> > http://www.merck.com/contact/contacts.html) that may be 
> confidential,
> > proprietary copyrighted and/or legally privileged. It is 
> intended solely
> > for the use of the individual or entity named on this 
> message. If you are
> > not the intended recipient, and have received this message in error,
> > please notify us immediately by reply e-mail and then 
> delete it from 
> > your system.
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-devel mailing list