[Rd] Increase transparency: suggestion on how to avoid namespaces and/or unnecessary overwrites of existing functions

Sat Oct 1 18:46:53 CEST 2011

       When selecting names for functions and variables, I sometimes use 
library(sos) to look for existing conflicts with other packages.  This 
won't solve all the problems Janko mentioned, but it can help avoid 
some.  Spencer


On 10/1/2011 9:11 AM, Dominick Samperi wrote:
> On Tue, Aug 23, 2011 at 2:23 PM, Janko Thyson
> <janko.thyson.rstuff at googlemail.com>  wrote:
>> aDear list,
>>
>> I'm aware of the fact that I posted on something related a while ago, but I
>> just can't sweat this off and would like to ask your for an opinion:
>>
>> The problem:
>> Namespaces are great, but they don't resolve certain conflicts regarding
>> name clashes. There are more and more people out there trying to come up
>> with their own R packages, which is great also! Yet, it becomes more and
>> more likely that programmers will choose identical names for their exported
>> functions and/or that they add functionality to existing function (i.e.
>> overwriting existing functions).
>> The whole process of which packages overwrite which functions is somewhat
>> obscure and in addition depends on their order in the search path. On the
>> other hand, it is not possible to use "namespace" functionality (i.e.
>> 'namespace::fun()'; also less efficient than direct call; see illustration
>> below) during early stages of the development process (i.e. the package is
>> not finished yet) as there is no namespace available yet.
>>
>> I know of at least two cases where such overwrites (I think it's called
>> masking, right?) led to some confusion at our chair:
>> 1) loading package forecast overwrites certain functions in stats which made
>> some code refactoring necessary
>> 2) loading package 'R.utils' followed by package 'roxygen' overwrites
>> 'parse.default()' which results in errors for something like
>> 'eval(parse(text="a<- 1"))' ; see illustration below)
>> And I'm sure the community could come up with lots more of such scenarios.
>>
>> Suggestions:
>> 1) In order to avoid name clashes/unintended overwrites, how about switching
>> to a coding paradigm that explicitly (and automatically) includes a
>> package's name in all its functions' names once code is turned into a real
>> package? E.g., getting used to "preemptively" type 'package_fun()' or
>> 'package.fun()' instead of just 'fun()'. Better to be save than sorry,
>> right? This could be realized pretty easily (see example below) and, IMHO,
>> would significantly increase transparency.
>> 2) In order to avoid intended (but for the user often pretty obscure)
>> overwrites of existing functions, we could use the same mechanism together
>> with the "rule": just don't provide any functions that overwrite existing
>> ones, rather prepend your version of that function with your package name
>> and leave it up to the user which version he wants to call.
> Experts from the Lisp-Stats community have added a number
> of functions to R that were inspired by Lisp, but one feature that apparently
> was not added is the shadowing feature of Common Lisp. Here the default
> behavior is not to permit packages to import conflicting names unless
> explicit shadowing directives are specified.
>
> Arguably a package is not intended to be used like a callable library,
> yet this is the way they are often used in the R context. This kind of
> shadowing tool might help to make this practice safer, at the expense
> of requiring the developer to specify explicit shadowing directives.
>
> Dominick
>
>> At the moment, all of this is probably not that big of a deal yet, but my
>> suggestion has more of a mid-term/long-term character.
>>
>> Below you find a little illustration. I'm probably asking too much, but it'd
>> be great if we could get a little discussion going on how to improve the way
>> of loading packages!
>>
>> Best regards and thanks for R and all it's packages!
>> Janko
>>
>> ################################################################################
>> # PROOF OF CONCEPT
>> ################################################################################
>>
>> # 1) PROBLEM
>> # IMHO, with the number of packages submitted to CRAN constantly increasing,
>> # over time we will be likely to see problems with respect to name clashes.
>> # The main reasons I see for this are the following:
>> # a) package developers picking identical names for their exported functions
>> # b) package developers overwriting base functions in order to add
>> functionality
>> #    to existing functions
>> # c) ...
>> #
>> # This can create scenarios in which the user might not exactly know that
>> # he/she is using a 'modified' version of a specific function. More so, the
>> user
>> # needs to carefully read the description of each new package he plans
>> # to use in order to find out which functions are exported and which
>> existing
>> # functions might be overwritten. This in turn might imply that the user's
>> # existing code needs to be refactored (i.e. instead of using 'fun()' it
>> # might now be necessary to type 'namespace::fun()' to be sure that the
>> desired
>> # function is called).
>>
>> # 2) SUGGESTED SOLUTION
>> # That being said, why don't we switch to a 'preemptive' coding paradigm
>> # where the default way of calling functions includes the specification of
>> # its namespace? In principle, the functionality offered by
>> 'namespace::fun()'
>> # gets the job done.
>> # BUT:
>> # a) it is slower compared to the direct way of calling a function.
>> #    (see illustration below).
>> # b) this option is not available througout the development process of a
>> package
>> #    as there is no namespace yet and there's no way to emulate one. This in
>> #    turn means that even though a package developer would buy into strictly
>> #    using 'mypkg::fun()' throughout his package code, he can only do so at
>> the
>> #    very final stage of the process RIGHT before turning his code into a
>> #    working package (when he's absolutely sure everything is working as
>> planned).
>> #    For debugging he would need to go back to using 'fun()'. Pretty
>> cumbersome.
>>
>> # So how about simply automatically prepending a given function's name with
>> # the package's name for each package that is build (e.g. 'pkg.fun()' or
>> # 'pkg_fun()')? In the end, this would just be a small change for new
>> packages
>> # without a significant decrease of performance and it could also be
>> realized
>> # at early stages of the development process (see illustration below).
>>
>> # 3) ILLUSTRATION
>>
>> # Example case where base function 'parse.default' is overwritten:
>> parse(text="a<- 5")    # Works
>> require(R.utils)
>> require(roxygen)
>> parse(text="a<- 5")    # Does not work anymore
>>
>> ################# START A NEW R SESSION BEFORE YOU CONTINUE
>> ####################
>>
>> # Inefficiency of 'namespace::fun()':
>> require(microbenchmark)
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(base::parse(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> # Can be made up by explicit assignment:
>> foo<- base::parse
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(foo(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> # Automatically prepend function names:
>> processNamespaces<- function(
>>     do.global=FALSE,
>>     do.verbose=FALSE,
>>     .delim.name="_",
>>     ...
>> ){
>>     srch.list.0<- search()
>>     srch.list<- gsub("package:", "", srch.list.0)
>>     if(!do.global){
>>         assign(".NS", new.env(), envir=.GlobalEnv)
>>     }
>>     out<- lapply(1:length(srch.list), function(x.pkg){
>>         pkg<- srch.list[x.pkg]
>>
>>         # SKIP LIST
>>         if(pkg %in% c(".GlobalEnv", "Autoloads")){
>>             return(NULL)
>>         }
>>         # /
>>
>>         # TARGET ENVIR
>>         if(!do.global){
>>             # ADD PACKAGE TO .NS ENVIRONMENT
>>             envir<- eval(substitute(
>>                 assign(PKG, new.env(), envir=.NS),
>>                 list(PKG=pkg)
>>             ))
>>             # /
>> #            envir<- get(pkg, envir=.NS, inherits=FALSE)
>>             envir.msg<- paste(".NS$", pkg, sep="")
>>         } else {
>>             envir<- .GlobalEnv
>>             envir.msg<- ".GlobalEnv"
>>         }
>>         # /
>>
>>         # PROCESS FUNCTIONS
>>         cnt<- ls(pos=x.pkg)
>>         out<- unlist(sapply(cnt, function(x.cnt){
>>             value<- get(x.cnt, pos=x.pkg, inherits=FALSE)
>>             obj.mod<- paste(pkg, x.cnt, sep=.delim.name)
>>             if(!is.function(value)){
>>                 return(NULL)
>>             }
>>             if(do.verbose){
>>                 cat(paste("Assigning '", obj.mod, "' to '", envir.msg,
>>                     "'", sep=""), sep="\n")
>>             }
>>             eval(substitute(
>>                 assign(OBJ.MOD, value, envir=ENVIR),
>>                 list(
>>                     OBJ.MOD=obj.mod,
>>                     ENVIR=envir
>>                 )
>>             ))
>>             return(obj.mod)
>>         }))
>>         names(out)<- NULL
>>         # /
>>         return(out)
>>     })
>>     names(out)<- srch.list
>>     return(out)
>> }
>>
>> # +++++
>>
>> funs<- processNamespaces(do.verbose=TRUE)
>> ls(.NS)
>> ls(.NS$base)
>> .NS$base$base_parse
>>
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(.NS$base$base_parse(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> #+++++
>>
>> funs<- processNamespaces(do.global=TRUE, do.verbose=TRUE)
>> base_parse
>>
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(base_parse(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>