[Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
Peter Haverty
haverty.peter at gene.com
Thu Jan 8 22:44:56 CET 2015
Michael's idea has an interesting bonus that he and I discussed earlier.
It would be very convenient to have a container of key/value pairs. I
imagine many people often write this:
x - mapply( names(x), x, FUN=function(k,v) { # work with key and value }
especially ex perl people accustomed to
while ( ($key, $value) = each( some_hash ) { }
Perhaps there is room for additional discussion of using lists of SYMSXPs
in this manner. (If SYMSXPs are not that safe, perhaps a looping construct
for named vectors that gave the illusion iterating over a list of
two-tuples.)
Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Thu, Jan 8, 2015 at 11:57 AM, <luke-tierney at uiowa.edu> wrote:
> On Thu, 8 Jan 2015, Michael Lawrence wrote:
>
> If we do add an argument to get(), then it should be named consistently
>> with the ifnotfound argument of mget(). As mentioned, the possibility of a
>> NULL value is problematic. One solution is a sentinel value that indicates
>> an unbound value (like R_UnboundValue).
>>
>
> A null default is fine -- it's a default; if it isn't right for a
> particular case you can provide something else.
>
>
>> But another idea (and one pretty similar to John's) is to follow the
>> SYMSXP
>> design at the C level, where there is a structure that points to the name
>> and a value. We already have SYMSXPs at the R level of course (name
>> objects) but they do not provide access to the value, which is typically
>> R_UnboundValue. But this does not even need to be implemented with SYMSXP.
>> The design would allow something like:
>>
>> binding <- getBinding("x", env)
>> if (hasValue(binding)) {
>> x <- value(binding) # throws an error if none
>> message(name(binding), "has value", x)
>> }
>>
>> That I think it is a bit verbose but readable and could be made fast. And
>> I
>> think binding objects would be useful in other ways, as they are
>> essentially a "named object". For example, when iterating over an
>> environment.
>>
>
> This would need a lot more thought. Directly exposing the internals is
> definitely not something we want to do as we may well want to change
> that design. But there are lots of other corner issues that would have
> to be thought through before going forward, such as what happens if an
> rm occurs between obtaining a binding object and doing something with
> it. Serialization would also need thinking through. This doesn't seem
> like a worthwhile place to spend our efforts to me.
>
> Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
> an argument to get() with missing giving current behavior may be OK
> too. Rewriting exists and get as .Primitives may be sufficient though.
>
> Best,
>
> luke
>
>
> Michael
>>
>>
>>
>>
>> On Thu, Jan 8, 2015 at 6:03 AM, John Nolan <jpnolan at american.edu> wrote:
>>
>> Adding an optional argument to get (and mget) like
>>>
>>> val <- get(name, where, ..., value.if.not.found=NULL ) (*)
>>>
>>> would be useful for many. HOWEVER, it is possible that there could be
>>> some confusion here: (*) can give a NULL because either x exists and
>>> has value NULL, or because x doesn't exist. If that matters, the user
>>> would need to be careful about specifying a value.if.not.found that
>>> cannot
>>> be confused with a valid value of x.
>>>
>>> To avoid this difficulty, perhaps we want both: have Martin's
>>> getifexists(
>>> )
>>> return a list with two values:
>>> - a boolean variable 'found' # = value returned by exists( )
>>> - a variable 'value'
>>>
>>> Then implement get( ) as:
>>>
>>> get <- function(x,...,value.if.not.found ) {
>>>
>>> if( missing(value.if.not.found) ) {
>>> a <- getifexists(x,... )
>>> if (!a$found) error("x not found")
>>> } else {
>>> a <- getifexists(x,...,value.if.not.found )
>>> }
>>> return(a$value)
>>> }
>>>
>>> Note that value.if.not.found has no default value in above.
>>> It behaves exactly like current get does if value.if.not.found
>>> is not specified, and if it is specified, it would be faster
>>> in the common situation mentioned below:
>>> if(exists(x,...)) { get(x,...) }
>>>
>>> John
>>>
>>> P.S. if you like dromedaries call it valueIfNotFound ...
>>>
>>> ..............................................................
>>> John P. Nolan
>>> Math/Stat Department
>>> 227 Gray Hall, American University
>>> 4400 Massachusetts Avenue, NW
>>> Washington, DC 20016-8050
>>>
>>> jpnolan at american.edu voice: 202.885.3140
>>> web: academic2.american.edu/~jpnolan
>>> ..............................................................
>>>
>>>
>>> -----"R-devel" <r-devel-bounces at r-project.org> wrote: -----
>>> To: Martin Maechler <maechler at stat.math.ethz.ch>, R-devel at r-project.org
>>> From: Duncan Murdoch
>>> Sent by: "R-devel"
>>> Date: 01/08/2015 06:39AM
>>> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
>>>
>>> On 08/01/2015 4:16 AM, Martin Maechler wrote:
>>> > In November, we had a "bug repository conversation"
>>> > with Peter Hagerty and myself:
>>> >
>>> > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
>>> >
>>> > where the bug report title started with
>>> >
>>> > --->> "exists" is a bottleneck for dispatch and package loading, ...
>>> >
>>> > Peter proposed an extra simplified and henc faster version of exists(),
>>> > and I commented
>>> >
>>> > > --- Comment #2 from Martin Maechler <maechler at stat.math.ethz.ch>
>>> ---
>>> > > I'm very grateful that you've started exploring the bottlenecks
>>> of
>>> loading
>>> > > packages with many S4 classes (and methods)...
>>> > > and I hope we can make real progress there rather sooner than
>>> later.
>>> >
>>> > > OTOH, your `summaryRprof()` in your vignette indicates that
>>> exists() may use
>>> > > upto 10% of the time spent in library(reportingTools), and your
>>> speedup
>>> > > proposals of exist() may go up to ca 30% which is good and well
>>> worth
>>> > > considering, but still we can only expect 2-3% speedup for
>>> package loading
>>> > > which unfortunately is not much.
>>> >
>>> > > Still I agree it is worth looking at exists() as you did ... and
>>> > > consider providing a fast simplified version of it in addition to
>>> current
>>> > > exists() [I think].
>>> >
>>> > > BTW, as we talk about enhancements here, maybe consider a further
>>> possibility:
>>> > > My subjective guess is that probably more than half of exists()
>>> uses are of the
>>> > > form
>>> >
>>> > > if(exists(name, where, .......)) {
>>> > > get(name, whare, ....)
>>> > > ..
>>> > > } else {
>>> > > NULL / error() / .. or similar
>>> > > }
>>> >
>>> > > i.e. many exists() calls when returning TRUE are immediately
>>> followed by the
>>> > > corresponding get() call which repeats quite a bit of the lookup
>>> that exists()
>>> > > has done.
>>> >
>>> > > Instead, I'd imagine a function, say getifexists(name, ...) that
>>> does both at
>>> > > once in the "exists is TRUE" case but in a way we can easily keep
>>> the if(.) ..
>>> > > else clause above. One already existing approach would use
>>> >
>>> > > if(!inherits(tryCatch(xx <- get(name, where, ...),
>>> error=function(e)e), "error")) {
>>> >
>>> > > ... (( work with xx )) ...
>>> >
>>> > > } else {
>>> > > NULL / error() / .. or similar
>>> > > }
>>> >
>>> > > but of course our C implementation would be more efficient and
>>> use
>>> more concise
>>> > > syntax {which should not look like error handling}. Follow ups
>>> to this idea
>>> > > should really go to R-devel (the mailing list).
>>> >
>>> > and now I do follow up here myself :
>>> >
>>> > I found that 'getifexists()' is actually very simple to implement,
>>> > I have already tested it a bit, but not yet committed to R-devel
>>> > (the "R trunk" aka "master branch") because I'd like to get
>>> > public comments {RFC := Request For Comments}.
>>> >
>>>
>>> I don't like the name -- I'd prefer getIfExists. As Baath (2012, R
>>> Journal) pointed out, R names are very inconsistent in naming
>>> conventions, but lowerCamelCase is the most common choice. Second most
>>> common is period.separated, so an argument could be made for
>>> get.if.exists, but there's still the possibility of confusion with S3
>>> methods, and users of other languages where "." is an operator find it a
>>> little strange.
>>>
>>> If you don't like lowerCamelCase (and a lot of people don't), then I
>>> think underscore_separated is the next best choice, so would use
>>> get_if_exists.
>>>
>>> Another possibility is to make no new name at all, and just add an
>>> optional parameter to get() (which if present acts as your value.if.not
>>> parameter, if not present keeps the current "object not found" error).
>>>
>>> Duncan Murdoch
>>>
>>>
>>> > My version of the help file {for both exists() and getifexists()}
>>> > rendered in text is
>>> >
>>> > ---------------------- help(getifexists) ------------------------------
>>> -
>>> > Is an Object Defined?
>>> >
>>> > Description:
>>> >
>>> > Look for an R object of the given name and possibly return it
>>> >
>>> > Usage:
>>> >
>>> > exists(x, where = -1, envir = , frame, mode = "any",
>>> > inherits = TRUE)
>>> >
>>> > getifexists(x, where = -1, envir = as.environment(where),
>>> > mode = "any", inherits = TRUE, value.if.not = NULL)
>>> >
>>> > Arguments:
>>> >
>>> > x: a variable name (given as a character string).
>>> >
>>> > where: where to look for the object (see the details section); if
>>> > omitted, the function will search as if the name of the
>>> > object appeared unquoted in an expression.
>>> >
>>> > envir: an alternative way to specify an environment to look in, but
>>> > it is usually simpler to just use the 'where' argument.
>>> >
>>> > frame: a frame in the calling list. Equivalent to giving 'where' as
>>> > 'sys.frame(frame)'.
>>> >
>>> > mode: the mode or type of object sought: see the 'Details' section.
>>> >
>>> > inherits: should the enclosing frames of the environment be searched?
>>> >
>>> > value.if.not: the return value of 'getifexists(x, *)' when 'x' does not
>>> > exist.
>>> >
>>> > Details:
>>> >
>>> > The 'where' argument can specify the environment in which to look
>>> > for the object in any of several ways: as an integer (the position
>>> > in the 'search' list); as the character string name of an element
>>> > in the search list; or as an 'environment' (including using
>>> > 'sys.frame' to access the currently active function calls). The
>>> > 'envir' argument is an alternative way to specify an environment,
>>> > but is primarily there for back compatibility.
>>> >
>>> > This function looks to see if the name 'x' has a value bound to it
>>> > in the specified environment. If 'inherits' is 'TRUE' and a value
>>> > is not found for 'x' in the specified environment, the enclosing
>>> > frames of the environment are searched until the name 'x' is
>>> > encountered. See 'environment' and the 'R Language Definition'
>>> > manual for details about the structure of environments and their
>>> > enclosures.
>>> >
>>> > *Warning:* 'inherits = TRUE' is the default behaviour for R but
>>> > not for S.
>>> >
>>> > If 'mode' is specified then only objects of that type are sought.
>>> > The 'mode' may specify one of the collections '"numeric"' and
>>> > '"function"' (see 'mode'): any member of the collection will
>>> > suffice. (This is true even if a member of a collection is
>>> > specified, so for example 'mode = "special"' will seek any type of
>>> > function.)
>>> >
>>> > Value:
>>> >
>>> > 'exists():' Logical, true if and only if an object of the correct
>>> > name and mode is found.
>>> >
>>> > 'getifexists():' The object-as from 'get(x, *)'- if 'exists(x, *)'
>>> > is true, otherwise 'value.if.not'.
>>> >
>>> > Note:
>>> >
>>> > With 'getifexists()', instead of the easy to read but somewhat
>>> > inefficient
>>> >
>>> > if (exists(myVarName, envir = myEnvir)) {
>>> > r <- get(myVarName, envir = myEnvir)
>>> > ## ... deal with r ...
>>> > }
>>> >
>>> > you now can use the more efficient (and slightly harder to read)
>>> >
>>> > if (!is.null(r <- getifexists(myVarName, envir = myEnvir))) {
>>> > ## ... deal with r ...
>>> > }
>>> >
>>> > References:
>>> >
>>> > Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
>>> > Language_. Wadsworth & Brooks/Cole.
>>> >
>>> > See Also:
>>> >
>>> > 'get'. For quite a different kind of "existence" checking, namely
>>> > if function arguments were specified, 'missing'; and for yet a
>>> > different kind, namely if a file exists, 'file.exists'.
>>> >
>>> > Examples:
>>> >
>>> > ## Define a substitute function if necessary:
>>> > if(!exists("some.fun", mode = "function"))
>>> > some.fun <- function(x) { cat("some.fun(x)\n"); x }
>>> > search()
>>> > exists("ls", 2) # true even though ls is in pos = 3
>>> > exists("ls", 2, inherits = FALSE) # false
>>> >
>>> > ## These are true (in most circumstances):
>>> > identical(ls, getifexists("ls"))
>>> > identical(NULL, getifexists(".foo.bar.")) # default value.if.not =
>>> NULL(!)
>>> >
>>> > ----------------- end[ help(getifexists) ]
>>> -----------------------------
>>> >
>>> > ______________________________________________
>>> > R-devel at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>> >
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa Phone: 319-335-3386
> Department of Statistics and Fax: 319-335-3017
> Actuarial Science
> 241 Schaeffer Hall email: luke-tierney at uiowa.edu
> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list