[Rd] Is it safe not to coerce matrices with as.double() in .C()?
Liaw, Andy
andy_liaw at merck.com
Fri Sep 17 22:16:11 CEST 2010
> From: Simon Urbanek
>
> On Sep 17, 2010, at 1:22 PM, Liaw, Andy wrote:
>
> > From: Liaw, Andy
> >>
> >> From: Prof Brian Ripley
> >>>
> >>> On Fri, 27 Aug 2010, peter dalgaard wrote:
> >>>
> >>>>
> >>>> On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote:
> >>>>
> >>>>> I'd very much appreciate guidance on this. A user
> >>> reported that the
> >>>>> as.double() coercion used inside the .C() call for a
> >> function in my
> >>>>> package (specifically, randomForest:::predict.randomForest()) is
> >>>>> taking up significant amount of time when called repeatedly, and
> >>>>> Removing some of these reduced run time by 30-40% in some cases.
> >>>>> These arguments are components of the fitted model (thus do not
> >>>>> change), and are matrices. Some basic tests show no
> >> difference in
> >>>>> The result when the coercions are removed (other than
> >>> faster run time).
> >>>>> What I like to know is whether this is safe to do, or is
> >>> it likely to
> >>>>> lead
> >>>>> to trouble in the future?
> >>>>
> >>>> In a word: yes. It is safe as long as you are absolutely
> >> sure that
> >>>> the argument has the right mode. The unsafeness comes in
> >>> when people
> >>>> might unwittingly use, say, an integer vector where a double was
> >>>> expected, causing memory overruns and general mayhem.
> >>>>
> >>>> Notice, BTW, that if you switch to .Call or .External, then
> >>> you have
> >>>> much more scope for handling such details on the C-side.
> E.g. you
> >>>> could coerce only if the object has the wrong mode, avoid
> >>>> duplicating things you won't be modifying anyway, etc.
> >>>
> >>> But as as.double is effectively .Call it has the same
> >> freedom, and it
> >>> does nothing if no coercion is required. The crunch here is
> >>> likely to
> >>> be
> >>>
> >>> 'as.double' attempts to coerce its argument to be of
> >>> double type:
> >>> like 'as.vector' it strips attributes including names.
> >>> (To ensure
> >>> that an object is of double type without stripping
> >>> attributes, use
> >>> 'storage.mode'.)
> >>>
> >>> I suspect the issue is the copying to remove attributes, in
> >> which case
> >>
> >> I can certainly believe this. I've tried replacing
> >> as.double() to c(), thinking attributes need to be stripped.
> >> That actually increased run time very slightly instead of
> reducing it.
> >>
> >>> storage.mode(x) <- "double"
> >>>
> >>> should be a null op and so both fast and safe.
> >>
> >> Will follow this advise. Thanks to both of you for the help!
> >
> > My apologies for coming back to this so late. I did some
> test, and found that
> >
> > storage.mode(x) <- "double"
> >
> > isn't as low on resource as I thought it might be.
> Changing the code to this from
> >
> > x <- as.double(x)
> >
> > did not give the expected speed improvement. Here's a little test:
> >
> > f1 <- function(x) { as.double(x); NULL }
> > f2 <- function(x) { storage.mode(x) <- "double"; NULL }
> > f3 <- function(x) { x <- x; NULL }
> > set.seed(917)
> > reps <- 500
> > x <- matrix(rnorm(1e6L), 1e3L, 1e3L)
> > system.time(junk <- replicate(reps, f1(x)))
> > system.time(junk <- replicate(reps, f2(x)))
> > system.time(junk <- replicate(reps, f3(x)))
> >
> > On my laptop running R 2.11.1 Patched (2010-06-26 r52410), I get:
> >
> > R> system.time(junk <- replicate(reps, f1(x)))
> > user system elapsed
> > 3.54 2.14 5.74
> > R> system.time(junk <- replicate(reps, f2(x)))
> > user system elapsed
> > 3.32 2.11 5.92
> > R> system.time(junk <- replicate(reps, f3(x)))
> > user system elapsed
> > 0 0 0
> >
> > Perhaps I need to first check and see if the storage mode
> is as expected before trying coercion?
> >
>
> Well, the devil is in the details. Although storage.mode<- is
> a noop itself, the issue is that it does trigger duplication
> because it is an assignment, not because storage mode would
> change anything. Technically, x <- x is a special case which
> is truly a noop whereas any call `foo<-` has to assume
> modification. So, yes, in your case
> f4 <- function(x) { if (storage.mode(x) != "double")
> storage.mode(x) <- "double"; NULL }
> will have the same speed as f3. If you are going in to .Call
> then you could as well do that in the C side (with the
> benefit of being able to strip attributes since you can get
> them from the original object if you care...).
>
> Cheers,
> Simon
Thanks a lot, Simon, for the clarification. Unfortunately I'm not using
.Call(), but .C() with DUP=FALSE, and it's exactly the duplication that
I'm trying to avoid. For now I just inserted tests (is.double() and
is.integer()) and only do the coercion if needed, prior to the .C()
call. That gives the speed up that I was expecting.
To do this more cleanly, I really need to learn .Call()...
Best,
Andy
>
> > Best,
> > Andy
> >
> >
> >
> >> Best,
> >> Andy
> >>
> >>
> >>> --
> >>> Brian D. Ripley, ripley at stats.ox.ac.uk
> >>> Professor of Applied Statistics,
> http://www.stats.ox.ac.uk/~ripley/
> >>> University of Oxford, Tel: +44 1865 272861 (self)
> >>> 1 South Parks Road, +44 1865 272866 (PA)
> >>> Oxford OX1 3TG, UK Fax: +44 1865 272595
> >>>
> >> Notice: This e-mail message, together with any
> attachments, contains
> >> information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station,
> >> New Jersey, USA 08889), and/or its affiliates Direct contact
> >> information
> >> for affiliates is available at
> >> http://www.merck.com/contact/contacts.html) that may be
> confidential,
> >> proprietary copyrighted and/or legally privileged. It is
> >> intended solely
> >> for the use of the individual or entity named on this
> >> message. If you are
> >> not the intended recipient, and have received this message
> in error,
> >> please notify us immediately by reply e-mail and then
> delete it from
> >> your system.
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> > Notice: This e-mail message, together with any
> attachments, contains
> > information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station,
> > New Jersey, USA 08889), and/or its affiliates Direct
> contact information
> > for affiliates is available at
> > http://www.merck.com/contact/contacts.html) that may be
> confidential,
> > proprietary copyrighted and/or legally privileged. It is
> intended solely
> > for the use of the individual or entity named on this
> message. If you are
> > not the intended recipient, and have received this message in error,
> > please notify us immediately by reply e-mail and then
> delete it from
> > your system.
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
Notice: This e-mail message, together with any attachme...{{dropped:11}}
More information about the R-devel
mailing list