[Rd] we need an exists/get hybrid

Lorenz, David lorenz at usgs.gov
Thu Dec 4 15:24:17 CET 2014


All,
  So that suggests that .GlobalEnv[["X"]] is more efficient than get("X",
pos=1L). What about .GlobalEnv[["X"]] <-  value, compared to assign("X",
value)?
Dave

On Wed, Dec 3, 2014 at 3:30 PM, Peter Haverty <haverty.peter at gene.com>
wrote:

> Thanks Winston!  I'm amazed that "[[" beats calling the .Internal
> directly.  I guess the difference between .Primitive vs. .Internal is
> pretty significant for things on this time scale.
>
> NULL meaning NULL and NULL meaning undefined would lead to the same path
> for much of my code.  I'll be swapping out many exists and get calls later
> today.  Thanks!
>
> I do still think it would be very useful to have some way to discriminate
> the two NULL cases.  I'm reminded of how perl does the same thing.  It's
> been a while, but it was something like
>
> if (defined(x{'c'})) { print x{'c'}; }  # This is still two lookups, but it
> has the "defined" concept.
>
> or maybe even
>
> if (defined( foo = x{'c'} ) ) { print foo; }
>
>
> Thanks again for the timings!
>
>
> Pete
>
> ____________________
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> phaverty at gene.com
>
> On Wed, Dec 3, 2014 at 12:48 PM, Winston Chang <winstonchang1 at gmail.com>
> wrote:
>
> > I've looked at related speed issues in the past, and have a couple
> > related points to add. (I've put the info below at
> > http://rpubs.com/wch/46428.)
> >
> > There's a significant amount of overhead just from calling the R
> > function get(). This is true even when you skip the pos argument and
> > provide envir. For example, if you call get(), it takes much more time
> > than .Internal(get()), which is what get() does.
> >
> > If you already know that the object exists in an environment, it's
> > faster to use e$x, and slightly faster still to use e[["x"]]:
> >
> > e <- new.env()
> > e$a <- 1
> >
> > # Accessing objects in environments
> > microbenchmark(
> >   get("a", e, inherits = FALSE),
> >   get("a", envir = e, inherits = FALSE),
> >   .Internal(get("a", e, "any", FALSE)),
> >   e$a,
> >   e[["a"]],
> >   .Primitive("[[")(e, "a"),
> >
> >   unit = "us"
> > )
> > #>   median                                  name
> > #> 1 1.0300         get("a", e, inherits = FALSE)
> > #> 2 0.9425 get("a", envir = e, inherits = FALSE)
> > #> 3 0.3080  .Internal(get("a", e, "any", FALSE))
> > #> 4 0.2305                                   e$a
> > #> 5 0.1740                              e[["a"]]
> > #> 6 0.2905              .Primitive("[[")(e, "a")
> >
> >
> > A similar thing happens with exists(): the R function wrapper adds
> > significant overhead on top of .Internal(exists()). It's also faster
> > to use $ and [[, then test for NULL, but of course this won't
> > distinguish between objects that don't exist, and those that do exist
> > but have a NULL value:
> >
> > # Test for existence of `a` (which exists), and `c` (which doesn't)
> > microbenchmark(
> >   exists('a', e, inherits = FALSE),
> >   exists('a', envir = e, inherits = FALSE),
> >   .Internal(exists('a', e, 'any', FALSE)),
> >   'a' %in% ls(e, all.names = TRUE),
> >   is.null(e[['a']]),
> >   is.null(e$a),
> >
> >   exists('c', e, inherits = FALSE),
> >   exists('c', envir = e, inherits = FALSE),
> >   .Internal(exists('c', e, 'any', FALSE)),
> >   'c' %in% ls(e, all.names = TRUE),
> >   is.null(e[['c']]),
> >   is.null(e$c),
> >
> >   unit = "us"
> > )
> > #>    median                                     name
> > #> 1  1.2015         exists("a", e, inherits = FALSE)
> > #> 2  1.0545 exists("a", envir = e, inherits = FALSE)
> > #> 3  0.3615  .Internal(exists("a", e, "any", FALSE))
> > #> 4  7.6345         "a" %in% ls(e, all.names = TRUE)
> > #> 5  0.3055                        is.null(e[["a"]])
> > #> 6  0.3270                             is.null(e$a)
> > #> 7  1.1890         exists("c", e, inherits = FALSE)
> > #> 8  1.0370 exists("c", envir = e, inherits = FALSE)
> > #> 9  0.3465  .Internal(exists("c", e, "any", FALSE))
> > #> 10 7.5475         "c" %in% ls(e, all.names = TRUE)
> > #> 11 0.2675                        is.null(e[["c"]])
> > #> 12 0.3010                             is.null(e$c)
> >
> >
> > -Winston
> >
> > On Tue, Dec 2, 2014 at 8:46 PM, Peter Haverty <haverty.peter at gene.com>
> > wrote:
> > > Hi All,
> > >
> > > I've been looking into speeding up the loading of packages that use a
> lot
> > > of S4.  After profiling I noticed the "exists" function accounts for a
> > > surprising fraction of the time.  I have some thoughts about speeding
> up
> > > exists (below). More to the point of this post, Martin Mächler noted
> that
> > > 'exists' and 'get' are often used in conjunction.  Both functions are
> > > different usages of the do_get C function, so it's a pity to run that
> > twice.
> > >
> > > "get" gives an error when a symbol is not found, so you can't just do a
> > > 'get'.  With R's C library, one might do
> > >
> > > SEXP x = findVarInFrame3(symbol,env);
> > > if (x != R_UnboundValue) {
> > >     // do stuff with x
> > > }
> > >
> > > It would be very convenient to have something like this at the R level.
> > We
> > > don't want to do any tryCatch stuff or to add args to get (That would
> > kill
> > > any speed advantage. The overhead for handling redundant args accounts
> > for
> > > 30% of the time used by "exists").  Michael Lawrence and I worked out
> > that
> > > we need a function that returns either the desired object, or something
> > > that represents R_UnboundValue. We also need a very cheap way to check
> if
> > > something equals this new R_UnboundValue. This might look like
> > >
> > > if (defined(x <- fetch(symbol, env))) {
> > >   do_stuff_with_x(x)
> > > }
> > >
> > > A few more thoughts about "exists":
> > >
> > > Moving the bit of R in the exists function to C saves 10% of the time.
> > > Dropping the redundant pos and frame args entirely saves 30% of the
> time
> > > used by this function. I suggest that the arguments of both get and
> > > exists should
> > > be simplified to (x, envir, mode, inherits). The existing C code
> handles
> > > numeric, character, and environment input for where. The arg frame is
> > > rarely used (0/128 exists calls in the methods package). Users that
> need
> > to
> > > can call sys.frame themselves. get already lacks a frame argument and
> the
> > > manpage for exists notes that envir is only there for backwards
> > > compatibility. Let's deprecate the extra args in exists and get and
> > perhaps
> > > move the extra argument handling to C in the interim.  Similarly, the
> > > "assign" function does nothing with the "immediate" argument.
> > >
> > > I'd be interested to hear if there is any support for a "fetch"-like
> > > function (and/or deprecating some unused arguments).
> > >
> > > All the best,
> > > Pete
> > >
> > >
> > >
> > > Pete
> > >
> > > ____________________
> > > Peter M. Haverty, Ph.D.
> > > Genentech, Inc.
> > > phaverty at gene.com
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list