[Rd] we need an exists/get hybrid

Sven E. Templer sven.templer at gmail.com
Thu Dec 4 20:04:29 CET 2014


David, 'assign' is slower than '<-':

##   median                                          expr

## 1 0.1440                                  X <- letters
## 2 0.4420         .Internal(assign("X", letters, e, F))
## 3 1.1820                           e[["X"]] <- letters
## 4 1.2570                                e$X <- letters
## 5 1.8380 assign("X", letters, envir = e, inherits = F)
## 6 1.9415         assign("X", letters, e, inherits = F)

(micro seconds, 500 times, see http://rpubs.com/setempler/46568)

---

Two questions:

'X<-letters' is the fastest since it does not need to change the
environment from 'benchmark' to 'e'?
Why is the call to '.Internal' faster than '[[<-' as compared to the
'get'/'[[' functions/benchmark of Winston?

thanks,
s

On 4 December 2014 at 15:24, Lorenz, David <lorenz at usgs.gov> wrote:
> All,
>   So that suggests that .GlobalEnv[["X"]] is more efficient than get("X",
> pos=1L). What about .GlobalEnv[["X"]] <-  value, compared to assign("X",
> value)?
> Dave
>
> On Wed, Dec 3, 2014 at 3:30 PM, Peter Haverty <haverty.peter at gene.com>
> wrote:
>
>> Thanks Winston!  I'm amazed that "[[" beats calling the .Internal
>> directly.  I guess the difference between .Primitive vs. .Internal is
>> pretty significant for things on this time scale.
>>
>> NULL meaning NULL and NULL meaning undefined would lead to the same path
>> for much of my code.  I'll be swapping out many exists and get calls later
>> today.  Thanks!
>>
>> I do still think it would be very useful to have some way to discriminate
>> the two NULL cases.  I'm reminded of how perl does the same thing.  It's
>> been a while, but it was something like
>>
>> if (defined(x{'c'})) { print x{'c'}; }  # This is still two lookups, but it
>> has the "defined" concept.
>>
>> or maybe even
>>
>> if (defined( foo = x{'c'} ) ) { print foo; }
>>
>>
>> Thanks again for the timings!
>>
>>
>> Pete
>>
>> ____________________
>> Peter M. Haverty, Ph.D.
>> Genentech, Inc.
>> phaverty at gene.com
>>
>> On Wed, Dec 3, 2014 at 12:48 PM, Winston Chang <winstonchang1 at gmail.com>
>> wrote:
>>
>> > I've looked at related speed issues in the past, and have a couple
>> > related points to add. (I've put the info below at
>> > http://rpubs.com/wch/46428.)
>> >
>> > There's a significant amount of overhead just from calling the R
>> > function get(). This is true even when you skip the pos argument and
>> > provide envir. For example, if you call get(), it takes much more time
>> > than .Internal(get()), which is what get() does.
>> >
>> > If you already know that the object exists in an environment, it's
>> > faster to use e$x, and slightly faster still to use e[["x"]]:
>> >
>> > e <- new.env()
>> > e$a <- 1
>> >
>> > # Accessing objects in environments
>> > microbenchmark(
>> >   get("a", e, inherits = FALSE),
>> >   get("a", envir = e, inherits = FALSE),
>> >   .Internal(get("a", e, "any", FALSE)),
>> >   e$a,
>> >   e[["a"]],
>> >   .Primitive("[[")(e, "a"),
>> >
>> >   unit = "us"
>> > )
>> > #>   median                                  name
>> > #> 1 1.0300         get("a", e, inherits = FALSE)
>> > #> 2 0.9425 get("a", envir = e, inherits = FALSE)
>> > #> 3 0.3080  .Internal(get("a", e, "any", FALSE))
>> > #> 4 0.2305                                   e$a
>> > #> 5 0.1740                              e[["a"]]
>> > #> 6 0.2905              .Primitive("[[")(e, "a")
>> >
>> >
>> > A similar thing happens with exists(): the R function wrapper adds
>> > significant overhead on top of .Internal(exists()). It's also faster
>> > to use $ and [[, then test for NULL, but of course this won't
>> > distinguish between objects that don't exist, and those that do exist
>> > but have a NULL value:
>> >
>> > # Test for existence of `a` (which exists), and `c` (which doesn't)
>> > microbenchmark(
>> >   exists('a', e, inherits = FALSE),
>> >   exists('a', envir = e, inherits = FALSE),
>> >   .Internal(exists('a', e, 'any', FALSE)),
>> >   'a' %in% ls(e, all.names = TRUE),
>> >   is.null(e[['a']]),
>> >   is.null(e$a),
>> >
>> >   exists('c', e, inherits = FALSE),
>> >   exists('c', envir = e, inherits = FALSE),
>> >   .Internal(exists('c', e, 'any', FALSE)),
>> >   'c' %in% ls(e, all.names = TRUE),
>> >   is.null(e[['c']]),
>> >   is.null(e$c),
>> >
>> >   unit = "us"
>> > )
>> > #>    median                                     name
>> > #> 1  1.2015         exists("a", e, inherits = FALSE)
>> > #> 2  1.0545 exists("a", envir = e, inherits = FALSE)
>> > #> 3  0.3615  .Internal(exists("a", e, "any", FALSE))
>> > #> 4  7.6345         "a" %in% ls(e, all.names = TRUE)
>> > #> 5  0.3055                        is.null(e[["a"]])
>> > #> 6  0.3270                             is.null(e$a)
>> > #> 7  1.1890         exists("c", e, inherits = FALSE)
>> > #> 8  1.0370 exists("c", envir = e, inherits = FALSE)
>> > #> 9  0.3465  .Internal(exists("c", e, "any", FALSE))
>> > #> 10 7.5475         "c" %in% ls(e, all.names = TRUE)
>> > #> 11 0.2675                        is.null(e[["c"]])
>> > #> 12 0.3010                             is.null(e$c)
>> >
>> >
>> > -Winston
>> >
>> > On Tue, Dec 2, 2014 at 8:46 PM, Peter Haverty <haverty.peter at gene.com>
>> > wrote:
>> > > Hi All,
>> > >
>> > > I've been looking into speeding up the loading of packages that use a
>> lot
>> > > of S4.  After profiling I noticed the "exists" function accounts for a
>> > > surprising fraction of the time.  I have some thoughts about speeding
>> up
>> > > exists (below). More to the point of this post, Martin Mächler noted
>> that
>> > > 'exists' and 'get' are often used in conjunction.  Both functions are
>> > > different usages of the do_get C function, so it's a pity to run that
>> > twice.
>> > >
>> > > "get" gives an error when a symbol is not found, so you can't just do a
>> > > 'get'.  With R's C library, one might do
>> > >
>> > > SEXP x = findVarInFrame3(symbol,env);
>> > > if (x != R_UnboundValue) {
>> > >     // do stuff with x
>> > > }
>> > >
>> > > It would be very convenient to have something like this at the R level.
>> > We
>> > > don't want to do any tryCatch stuff or to add args to get (That would
>> > kill
>> > > any speed advantage. The overhead for handling redundant args accounts
>> > for
>> > > 30% of the time used by "exists").  Michael Lawrence and I worked out
>> > that
>> > > we need a function that returns either the desired object, or something
>> > > that represents R_UnboundValue. We also need a very cheap way to check
>> if
>> > > something equals this new R_UnboundValue. This might look like
>> > >
>> > > if (defined(x <- fetch(symbol, env))) {
>> > >   do_stuff_with_x(x)
>> > > }
>> > >
>> > > A few more thoughts about "exists":
>> > >
>> > > Moving the bit of R in the exists function to C saves 10% of the time.
>> > > Dropping the redundant pos and frame args entirely saves 30% of the
>> time
>> > > used by this function. I suggest that the arguments of both get and
>> > > exists should
>> > > be simplified to (x, envir, mode, inherits). The existing C code
>> handles
>> > > numeric, character, and environment input for where. The arg frame is
>> > > rarely used (0/128 exists calls in the methods package). Users that
>> need
>> > to
>> > > can call sys.frame themselves. get already lacks a frame argument and
>> the
>> > > manpage for exists notes that envir is only there for backwards
>> > > compatibility. Let's deprecate the extra args in exists and get and
>> > perhaps
>> > > move the extra argument handling to C in the interim.  Similarly, the
>> > > "assign" function does nothing with the "immediate" argument.
>> > >
>> > > I'd be interested to hear if there is any support for a "fetch"-like
>> > > function (and/or deprecating some unused arguments).
>> > >
>> > > All the best,
>> > > Pete
>> > >
>> > >
>> > >
>> > > Pete
>> > >
>> > > ____________________
>> > > Peter M. Haverty, Ph.D.
>> > > Genentech, Inc.
>> > > phaverty at gene.com
>> > >
>> > >         [[alternative HTML version deleted]]
>> > >
>> > >
>> > > ______________________________________________
>> > > R-devel at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> > >
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list