[Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

Martin Maechler maechler at stat.math.ethz.ch
Thu Jan 8 16:02:50 CET 2015


> Adding an optional argument to get (and mget) like
> val <- get(name, where, ..., value.if.not.found=NULL )   (*)

> would be useful for many.  HOWEVER, it is possible that there could be 
> some confusion here: (*) can give a NULL because either x exists and 
> has value NULL, or because x doesn't exist.   If that matters, the user 
> would need to be careful about specifying a value.if.not.found that cannot 
> be confused with a valid value of x.  

Exactly -- well, of course: That problem { NULL can be the legit value of what you
want to get() } was the only reason to have a 'value.if.not' argument at all. 

Note that this is not about a universal replacement of 
the  if(exists(..)) { .. get(..) } idiom, but rather a
replacement of these in the cases where speed matters very much,
which is e.g. in the low level support code for S4 method dispatch.

'value.if.not.found':
Note that CRAN checks requires all arguments to be written in
full length.  Even though we have auto completion in ESS,
Rstudio or other good R IDE's,  I very much like to keep
function calls somewhat compact.

And yes, as you mention the dromedars aka 2-hump camels:  
getIfExist is already horrible to my taste (and "_" is not S-like; 
yes that's all very much a matter of taste and yes I'm from the
20th century).

> To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) 
> return a list with two values: 
>   - a boolean variable 'found'  # = value returned by exists( )
>   - a variable 'value'

> Then implement get( ) as:

> get <- function(x,...,value.if.not.found ) {

>   if( missing(value.if.not.found) ) {
>     a <- getifexists(x,... )
>     if (!a$found) error("x not found")
>   } else {
>     a <- getifexists(x,...,value.if.not.found )
>   }
>   return(a$value)
> }

Interesting...
Note that the above get() implementation would just be "conceptually", as 
all of this is also quite a bit about speed, and we do the
different cases in C anyway [via 'op' code].

> Note that value.if.not.found has no default value in above.
> It behaves exactly like current get does if value.if.not.found 
> is not specified, and if it is specified, it would be faster 
> in the common situation mentioned below:   
>      if(exists(x,...)) { get(x,...) }

Good... Let's talk about your getifexists() as I argue we'd keep
get() exactly as it is now anyway, if we use a new 3rd function (I keep
calling 'getifexists()' for now):

I think in that case, getifexists() would not even *need* an argument 
'value.if.not' (or 'value.if.not.found'); it rather would return a 
  list(found = *, value = *)
in any case.
Alternatively, it could return
  structure(<found>, value = *)

In the first case, our main use case would be

      if((r <- getifexists(x, *))$found) {
         ## work with  r$value
      }

in the 2nd case {structure} :

      if((r <- getifexists(x, *))) {
         ## work with  attr(r,"value")
      }

I think that (both cases) would still be a bit slower (for the above
most important use case) but probably not much
and it would like slightly more readable than my

       if (!is.null(r <- getifexists(x, *))) {
          ## work with  r
       }

After all of this, I think I'd still somewhat prefer my original proposal,
but not strongly -- I had originally also thought of returning the
two parts explicitly, but then tended to prefer the version that
behaved exactly like get() in the case the object is found.

... Nice interesting ideas! ... 
let the proposals and consideration flow ...

Martin


> John

> P.S. if you like dromedaries call it valueIfNotFound ...

:-) ;-)  
I don't .. as I said above, I already strongly dislike more than one hump. 
[ Each capital is one key stroke ("Shift") more ,
  and each "_" is two key strokes more on most key boards...,
  and I do like identifiers that I can also quickly pronounce on
  the phone or in teaching .. ]

>  ..............................................................
>  John P. Nolan
>  Math/Stat Department
>  227 Gray Hall,   American University
>  4400 Massachusetts Avenue, NW
>  Washington, DC 20016-8050
>  ..............................................................


> -----"R-devel" <r-devel-bounces at r-project.org> wrote: ----- 
> To: Martin Maechler <maechler at stat.math.ethz.ch>, R-devel at r-project.org
> From: Duncan Murdoch 
> Sent by: "R-devel" 
> Date: 01/08/2015 06:39AM
> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

> On 08/01/2015 4:16 AM, Martin Maechler wrote:
> > In November, we had a "bug repository conversation"
> > with Peter Hagerty and myself:
> > 
> >   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
> > 
> > where the bug report title started with
> > 
> >  --->>  "exists" is a bottleneck for dispatch and package loading, ...
> > 
> > Peter proposed an extra simplified and henc faster version of exists(),
> > and I commented
> > 
> >     > --- Comment #2 from Martin Maechler <maechler at stat.math.ethz.ch> ---
> >     > I'm very grateful that you've started exploring the bottlenecks of loading
> >     > packages with many S4 classes (and methods)...
> >     > and I hope we can make real progress there rather sooner than later.
> > 
> >     > OTOH, your `summaryRprof()` in your vignette indicates that exists() may use
> >     > upto 10% of the time spent in library(reportingTools),  and your speedup
> >     > proposals of exist()  may go up to ca 30%  which is good and well worth
> >     > considering,  but still we can only expect 2-3% speedup for package loading
> >     > which unfortunately is not much.
> > 
> >     > Still I agree it is worth looking at exists() as you did  ... and 
> >     > consider providing a fast simplified version of it in addition to current
> >     > exists() [I think].
> > 
> >     > BTW, as we talk about enhancements here, maybe consider a further possibility:
> >     > My subjective guess is that probably more than half of exists() uses are of the
> >     > form
> > 
> >     > if(exists(name, where, .......)) {
> >     >    get(name, whare, ....)
> >     >    ..
> >     > } else { 
> >     >     NULL / error() / .. or similar
> >     > }
> > 
> >     > i.e. many exists() calls when returning TRUE are immediately followed by the
> >     > corresponding get() call which repeats quite a bit of the lookup that exists()
> >     > has done.
> > 
> >     > Instead, I'd imagine a function, say  getifexists(name, ...) that does both at
> >     > once in the "exists is TRUE" case but in a way we can easily keep the if(.) ..
> >     > else clause above.  One already existing approach would use
> > 
> >     > if(!inherits(tryCatch(xx <- get(name, where, ...), error=function(e)e), "error")) {
> > 
> >     >   ... (( work with xx )) ...
> > 
> >     > } else  { 
> >     >    NULL / error() / .. or similar
> >     > }
> > 
> >     > but of course our C implementation would be more efficient and use more concise
> >     > syntax {which should not look like error handling}.   Follow ups to this idea
> >     > should really go to R-devel (the mailing list).
> > 
> > and now I do follow up here myself :
> > 
> > I found that  'getifexists()' is actually very simple to implement,
> > I have already tested it a bit, but not yet committed to R-devel
> > (the "R trunk" aka "master branch") because I'd like to get
> > public comments {RFC := Request For Comments}.
> > 

> I don't like the name -- I'd prefer getIfExists.  As Baath (2012, R
> Journal) pointed out, R names are very inconsistent in naming
> conventions, but lowerCamelCase is the most common choice.  Second most
> common is period.separated, so an argument could be made for
> get.if.exists, but there's still the possibility of confusion with S3
> methods, and users of other languages where "." is an operator find it a
> little strange.

> If you don't like lowerCamelCase (and a lot of people don't), then I
> think underscore_separated is the next best choice, so would use
> get_if_exists.

> Another possibility is to make no new name at all, and just add an
> optional parameter to get() (which if present acts as your value.if.not
> parameter, if not present keeps the current "object not found" error).

> Duncan Murdoch


> > My version of the help file {for both exists() and getifexists()}
> > rendered in text is
> > 
> > ---------------------- help(getifexists) -------------------------------
> > Is an Object Defined?
> > 
> > Description:
> > 
> >      Look for an R object of the given name and possibly return it
> > 
> > Usage:
> > 
> >      exists(x, where = -1, envir = , frame, mode = "any",
> >             inherits = TRUE)
> >      
> >      getifexists(x, where = -1, envir = as.environment(where),
> >                  mode = "any", inherits = TRUE, value.if.not = NULL)
> >      
> > Arguments:
> > 
> >        x: a variable name (given as a character string).
> > 
> >    where: where to look for the object (see the details section); if
> >           omitted, the function will search as if the name of the
> >           object appeared unquoted in an expression.
> > 
> >    envir: an alternative way to specify an environment to look in, but
> >           it is usually simpler to just use the ‘where’ argument.
> > 
> >    frame: a frame in the calling list.  Equivalent to giving ‘where’ as
> >           ‘sys.frame(frame)’.
> > 
> >     mode: the mode or type of object sought: see the ‘Details’ section.
> > 
> > inherits: should the enclosing frames of the environment be searched?
> > 
> > value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not
> >           exist.
> > 
> > Details:
> > 
> >      The ‘where’ argument can specify the environment in which to look
> >      for the object in any of several ways: as an integer (the position
> >      in the ‘search’ list); as the character string name of an element
> >      in the search list; or as an ‘environment’ (including using
> >      ‘sys.frame’ to access the currently active function calls).  The
> >      ‘envir’ argument is an alternative way to specify an environment,
> >      but is primarily there for back compatibility.
> > 
> >      This function looks to see if the name ‘x’ has a value bound to it
> >      in the specified environment.  If ‘inherits’ is ‘TRUE’ and a value
> >      is not found for ‘x’ in the specified environment, the enclosing
> >      frames of the environment are searched until the name ‘x’ is
> >      encountered.  See ‘environment’ and the ‘R Language Definition’
> >      manual for details about the structure of environments and their
> >      enclosures.
> > 
> >      *Warning:* ‘inherits = TRUE’ is the default behaviour for R but
> >      not for S.
> > 
> >      If ‘mode’ is specified then only objects of that type are sought.
> >      The ‘mode’ may specify one of the collections ‘"numeric"’ and
> >      ‘"function"’ (see ‘mode’): any member of the collection will
> >      suffice.  (This is true even if a member of a collection is
> >      specified, so for example ‘mode = "special"’ will seek any type of
> >      function.)
> > 
> > Value:
> > 
> >      ‘exists():’ Logical, true if and only if an object of the correct
> >      name and mode is found.
> > 
> >      ‘getifexists():’ The object-as from ‘get(x, *)’- if ‘exists(x, *)’
> >      is true, otherwise ‘value.if.not’.
> > 
> > Note:
> > 
> >    With ‘getifexists()’, instead of the easy to read but somewhat
> >    inefficient
> >      
> >        if (exists(myVarName, envir = myEnvir)) {
> >          r <- get(myVarName, envir = myEnvir)
> >          ## ... deal with r ...
> >        }
> > 
> >    you now can use the more efficient (and slightly harder to read)
> >      
> >        if (!is.null(r <- getifexists(myVarName, envir = myEnvir))) {
> >          ## ... deal with r ...
> >        }
> > 
> > References:
> > 
> >      Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
> >      Language_.  Wadsworth & Brooks/Cole.
> > 
> > See Also:
> > 
> >      ‘get’.  For quite a different kind of “existence” checking, namely
> >      if function arguments were specified, ‘missing’; and for yet a
> >      different kind, namely if a file exists, ‘file.exists’.
> > 
> > Examples:
> > 
> >      ##  Define a substitute function if necessary:
> >      if(!exists("some.fun", mode = "function"))
> >        some.fun <- function(x) { cat("some.fun(x)\n"); x }
> >      search()
> >      exists("ls", 2) # true even though ls is in pos = 3
> >      exists("ls", 2, inherits = FALSE) # false
> >      
> >      ## These are true (in most circumstances):
> >      identical(ls,   getifexists("ls"))
> >      identical(NULL, getifexists(".foo.bar.")) # default value.if.not = NULL(!)
> > 
> > ----------------- end[ help(getifexists) ] -----------------------------



More information about the R-devel mailing list