[Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
Michael Lawrence
lawrence.michael at gene.com
Thu Jan 8 15:58:00 CET 2015
If we do add an argument to get(), then it should be named consistently
with the ifnotfound argument of mget(). As mentioned, the possibility of a
NULL value is problematic. One solution is a sentinel value that indicates
an unbound value (like R_UnboundValue).
But another idea (and one pretty similar to John's) is to follow the SYMSXP
design at the C level, where there is a structure that points to the name
and a value. We already have SYMSXPs at the R level of course (name
objects) but they do not provide access to the value, which is typically
R_UnboundValue. But this does not even need to be implemented with SYMSXP.
The design would allow something like:
binding <- getBinding("x", env)
if (hasValue(binding)) {
x <- value(binding) # throws an error if none
message(name(binding), "has value", x)
}
That I think it is a bit verbose but readable and could be made fast. And I
think binding objects would be useful in other ways, as they are
essentially a "named object". For example, when iterating over an
environment.
Michael
On Thu, Jan 8, 2015 at 6:03 AM, John Nolan <jpnolan at american.edu> wrote:
> Adding an optional argument to get (and mget) like
>
> val <- get(name, where, ..., value.if.not.found=NULL ) (*)
>
> would be useful for many. HOWEVER, it is possible that there could be
> some confusion here: (*) can give a NULL because either x exists and
> has value NULL, or because x doesn't exist. If that matters, the user
> would need to be careful about specifying a value.if.not.found that cannot
> be confused with a valid value of x.
>
> To avoid this difficulty, perhaps we want both: have Martin's getifexists(
> )
> return a list with two values:
> - a boolean variable 'found' # = value returned by exists( )
> - a variable 'value'
>
> Then implement get( ) as:
>
> get <- function(x,...,value.if.not.found ) {
>
> if( missing(value.if.not.found) ) {
> a <- getifexists(x,... )
> if (!a$found) error("x not found")
> } else {
> a <- getifexists(x,...,value.if.not.found )
> }
> return(a$value)
> }
>
> Note that value.if.not.found has no default value in above.
> It behaves exactly like current get does if value.if.not.found
> is not specified, and if it is specified, it would be faster
> in the common situation mentioned below:
> if(exists(x,...)) { get(x,...) }
>
> John
>
> P.S. if you like dromedaries call it valueIfNotFound ...
>
> ..............................................................
> John P. Nolan
> Math/Stat Department
> 227 Gray Hall, American University
> 4400 Massachusetts Avenue, NW
> Washington, DC 20016-8050
>
> jpnolan at american.edu voice: 202.885.3140
> web: academic2.american.edu/~jpnolan
> ..............................................................
>
>
> -----"R-devel" <r-devel-bounces at r-project.org> wrote: -----
> To: Martin Maechler <maechler at stat.math.ethz.ch>, R-devel at r-project.org
> From: Duncan Murdoch
> Sent by: "R-devel"
> Date: 01/08/2015 06:39AM
> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
>
> On 08/01/2015 4:16 AM, Martin Maechler wrote:
> > In November, we had a "bug repository conversation"
> > with Peter Hagerty and myself:
> >
> > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
> >
> > where the bug report title started with
> >
> > --->> "exists" is a bottleneck for dispatch and package loading, ...
> >
> > Peter proposed an extra simplified and henc faster version of exists(),
> > and I commented
> >
> > > --- Comment #2 from Martin Maechler <maechler at stat.math.ethz.ch>
> ---
> > > I'm very grateful that you've started exploring the bottlenecks of
> loading
> > > packages with many S4 classes (and methods)...
> > > and I hope we can make real progress there rather sooner than
> later.
> >
> > > OTOH, your `summaryRprof()` in your vignette indicates that
> exists() may use
> > > upto 10% of the time spent in library(reportingTools), and your
> speedup
> > > proposals of exist() may go up to ca 30% which is good and well
> worth
> > > considering, but still we can only expect 2-3% speedup for
> package loading
> > > which unfortunately is not much.
> >
> > > Still I agree it is worth looking at exists() as you did ... and
> > > consider providing a fast simplified version of it in addition to
> current
> > > exists() [I think].
> >
> > > BTW, as we talk about enhancements here, maybe consider a further
> possibility:
> > > My subjective guess is that probably more than half of exists()
> uses are of the
> > > form
> >
> > > if(exists(name, where, .......)) {
> > > get(name, whare, ....)
> > > ..
> > > } else {
> > > NULL / error() / .. or similar
> > > }
> >
> > > i.e. many exists() calls when returning TRUE are immediately
> followed by the
> > > corresponding get() call which repeats quite a bit of the lookup
> that exists()
> > > has done.
> >
> > > Instead, I'd imagine a function, say getifexists(name, ...) that
> does both at
> > > once in the "exists is TRUE" case but in a way we can easily keep
> the if(.) ..
> > > else clause above. One already existing approach would use
> >
> > > if(!inherits(tryCatch(xx <- get(name, where, ...),
> error=function(e)e), "error")) {
> >
> > > ... (( work with xx )) ...
> >
> > > } else {
> > > NULL / error() / .. or similar
> > > }
> >
> > > but of course our C implementation would be more efficient and use
> more concise
> > > syntax {which should not look like error handling}. Follow ups
> to this idea
> > > should really go to R-devel (the mailing list).
> >
> > and now I do follow up here myself :
> >
> > I found that 'getifexists()' is actually very simple to implement,
> > I have already tested it a bit, but not yet committed to R-devel
> > (the "R trunk" aka "master branch") because I'd like to get
> > public comments {RFC := Request For Comments}.
> >
>
> I don't like the name -- I'd prefer getIfExists. As Baath (2012, R
> Journal) pointed out, R names are very inconsistent in naming
> conventions, but lowerCamelCase is the most common choice. Second most
> common is period.separated, so an argument could be made for
> get.if.exists, but there's still the possibility of confusion with S3
> methods, and users of other languages where "." is an operator find it a
> little strange.
>
> If you don't like lowerCamelCase (and a lot of people don't), then I
> think underscore_separated is the next best choice, so would use
> get_if_exists.
>
> Another possibility is to make no new name at all, and just add an
> optional parameter to get() (which if present acts as your value.if.not
> parameter, if not present keeps the current "object not found" error).
>
> Duncan Murdoch
>
>
> > My version of the help file {for both exists() and getifexists()}
> > rendered in text is
> >
> > ---------------------- help(getifexists) -------------------------------
> > Is an Object Defined?
> >
> > Description:
> >
> > Look for an R object of the given name and possibly return it
> >
> > Usage:
> >
> > exists(x, where = -1, envir = , frame, mode = "any",
> > inherits = TRUE)
> >
> > getifexists(x, where = -1, envir = as.environment(where),
> > mode = "any", inherits = TRUE, value.if.not = NULL)
> >
> > Arguments:
> >
> > x: a variable name (given as a character string).
> >
> > where: where to look for the object (see the details section); if
> > omitted, the function will search as if the name of the
> > object appeared unquoted in an expression.
> >
> > envir: an alternative way to specify an environment to look in, but
> > it is usually simpler to just use the ‘where’ argument.
> >
> > frame: a frame in the calling list. Equivalent to giving ‘where’ as
> > ‘sys.frame(frame)’.
> >
> > mode: the mode or type of object sought: see the ‘Details’ section.
> >
> > inherits: should the enclosing frames of the environment be searched?
> >
> > value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not
> > exist.
> >
> > Details:
> >
> > The ‘where’ argument can specify the environment in which to look
> > for the object in any of several ways: as an integer (the position
> > in the ‘search’ list); as the character string name of an element
> > in the search list; or as an ‘environment’ (including using
> > ‘sys.frame’ to access the currently active function calls). The
> > ‘envir’ argument is an alternative way to specify an environment,
> > but is primarily there for back compatibility.
> >
> > This function looks to see if the name ‘x’ has a value bound to it
> > in the specified environment. If ‘inherits’ is ‘TRUE’ and a value
> > is not found for ‘x’ in the specified environment, the enclosing
> > frames of the environment are searched until the name ‘x’ is
> > encountered. See ‘environment’ and the ‘R Language Definition’
> > manual for details about the structure of environments and their
> > enclosures.
> >
> > *Warning:* ‘inherits = TRUE’ is the default behaviour for R but
> > not for S.
> >
> > If ‘mode’ is specified then only objects of that type are sought.
> > The ‘mode’ may specify one of the collections ‘"numeric"’ and
> > ‘"function"’ (see ‘mode’): any member of the collection will
> > suffice. (This is true even if a member of a collection is
> > specified, so for example ‘mode = "special"’ will seek any type of
> > function.)
> >
> > Value:
> >
> > ‘exists():’ Logical, true if and only if an object of the correct
> > name and mode is found.
> >
> > ‘getifexists():’ The object-as from ‘get(x, *)’- if ‘exists(x, *)’
> > is true, otherwise ‘value.if.not’.
> >
> > Note:
> >
> > With ‘getifexists()’, instead of the easy to read but somewhat
> > inefficient
> >
> > if (exists(myVarName, envir = myEnvir)) {
> > r <- get(myVarName, envir = myEnvir)
> > ## ... deal with r ...
> > }
> >
> > you now can use the more efficient (and slightly harder to read)
> >
> > if (!is.null(r <- getifexists(myVarName, envir = myEnvir))) {
> > ## ... deal with r ...
> > }
> >
> > References:
> >
> > Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
> > Language_. Wadsworth & Brooks/Cole.
> >
> > See Also:
> >
> > ‘get’. For quite a different kind of “existence” checking, namely
> > if function arguments were specified, ‘missing’; and for yet a
> > different kind, namely if a file exists, ‘file.exists’.
> >
> > Examples:
> >
> > ## Define a substitute function if necessary:
> > if(!exists("some.fun", mode = "function"))
> > some.fun <- function(x) { cat("some.fun(x)\n"); x }
> > search()
> > exists("ls", 2) # true even though ls is in pos = 3
> > exists("ls", 2, inherits = FALSE) # false
> >
> > ## These are true (in most circumstances):
> > identical(ls, getifexists("ls"))
> > identical(NULL, getifexists(".foo.bar.")) # default value.if.not =
> NULL(!)
> >
> > ----------------- end[ help(getifexists) ] -----------------------------
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list