[Rd] Is it safe not to coerce matrices with as.double() in .C()?
Simon Urbanek
simon.urbanek at r-project.org
Fri Sep 17 21:18:13 CEST 2010
On Sep 17, 2010, at 1:22 PM, Liaw, Andy wrote:
> From: Liaw, Andy
>>
>> From: Prof Brian Ripley
>>>
>>> On Fri, 27 Aug 2010, peter dalgaard wrote:
>>>
>>>>
>>>> On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote:
>>>>
>>>>> I'd very much appreciate guidance on this. A user
>>> reported that the
>>>>> as.double() coercion used inside the .C() call for a
>> function in my
>>>>> package (specifically, randomForest:::predict.randomForest()) is
>>>>> taking up significant amount of time when called repeatedly, and
>>>>> Removing some of these reduced run time by 30-40% in some cases.
>>>>> These arguments are components of the fitted model (thus do not
>>>>> change), and are matrices. Some basic tests show no
>> difference in
>>>>> The result when the coercions are removed (other than
>>> faster run time).
>>>>> What I like to know is whether this is safe to do, or is
>>> it likely to
>>>>> lead
>>>>> to trouble in the future?
>>>>
>>>> In a word: yes. It is safe as long as you are absolutely
>> sure that
>>>> the argument has the right mode. The unsafeness comes in
>>> when people
>>>> might unwittingly use, say, an integer vector where a double was
>>>> expected, causing memory overruns and general mayhem.
>>>>
>>>> Notice, BTW, that if you switch to .Call or .External, then
>>> you have
>>>> much more scope for handling such details on the C-side. E.g. you
>>>> could coerce only if the object has the wrong mode, avoid
>>>> duplicating things you won't be modifying anyway, etc.
>>>
>>> But as as.double is effectively .Call it has the same
>> freedom, and it
>>> does nothing if no coercion is required. The crunch here is
>>> likely to
>>> be
>>>
>>> ‘as.double’ attempts to coerce its argument to be of
>>> double type:
>>> like ‘as.vector’ it strips attributes including names.
>>> (To ensure
>>> that an object is of double type without stripping
>>> attributes, use
>>> ‘storage.mode’.)
>>>
>>> I suspect the issue is the copying to remove attributes, in
>> which case
>>
>> I can certainly believe this. I've tried replacing
>> as.double() to c(), thinking attributes need to be stripped.
>> That actually increased run time very slightly instead of reducing it.
>>
>>> storage.mode(x) <- "double"
>>>
>>> should be a null op and so both fast and safe.
>>
>> Will follow this advise. Thanks to both of you for the help!
>
> My apologies for coming back to this so late. I did some test, and found that
>
> storage.mode(x) <- "double"
>
> isn't as low on resource as I thought it might be. Changing the code to this from
>
> x <- as.double(x)
>
> did not give the expected speed improvement. Here's a little test:
>
> f1 <- function(x) { as.double(x); NULL }
> f2 <- function(x) { storage.mode(x) <- "double"; NULL }
> f3 <- function(x) { x <- x; NULL }
> set.seed(917)
> reps <- 500
> x <- matrix(rnorm(1e6L), 1e3L, 1e3L)
> system.time(junk <- replicate(reps, f1(x)))
> system.time(junk <- replicate(reps, f2(x)))
> system.time(junk <- replicate(reps, f3(x)))
>
> On my laptop running R 2.11.1 Patched (2010-06-26 r52410), I get:
>
> R> system.time(junk <- replicate(reps, f1(x)))
> user system elapsed
> 3.54 2.14 5.74
> R> system.time(junk <- replicate(reps, f2(x)))
> user system elapsed
> 3.32 2.11 5.92
> R> system.time(junk <- replicate(reps, f3(x)))
> user system elapsed
> 0 0 0
>
> Perhaps I need to first check and see if the storage mode is as expected before trying coercion?
>
Well, the devil is in the details. Although storage.mode<- is a noop itself, the issue is that it does trigger duplication because it is an assignment, not because storage mode would change anything. Technically, x <- x is a special case which is truly a noop whereas any call `foo<-` has to assume modification. So, yes, in your case
f4 <- function(x) { if (storage.mode(x) != "double") storage.mode(x) <- "double"; NULL }
will have the same speed as f3. If you are going in to .Call then you could as well do that in the C side (with the benefit of being able to strip attributes since you can get them from the original object if you care...).
Cheers,
Simon
> Best,
> Andy
>
>
>
>> Best,
>> Andy
>>
>>
>>> --
>>> Brian D. Ripley, ripley at stats.ox.ac.uk
>>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>>> University of Oxford, Tel: +44 1865 272861 (self)
>>> 1 South Parks Road, +44 1865 272866 (PA)
>>> Oxford OX1 3TG, UK Fax: +44 1865 272595
>>>
>> Notice: This e-mail message, together with any attachments, contains
>> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
>> New Jersey, USA 08889), and/or its affiliates Direct contact
>> information
>> for affiliates is available at
>> http://www.merck.com/contact/contacts.html) that may be confidential,
>> proprietary copyrighted and/or legally privileged. It is
>> intended solely
>> for the use of the individual or entity named on this
>> message. If you are
>> not the intended recipient, and have received this message in error,
>> please notify us immediately by reply e-mail and then delete it from
>> your system.
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> Notice: This e-mail message, together with any attach...{{dropped:18}}
More information about the R-devel
mailing list