[Rd] Increase transparency: suggestion on how to avoid namespaces and/or unnecessary overwrites of existing functions
Spencer Graves
spencer.graves at prodsyse.com
Sat Oct 1 18:46:53 CEST 2011
When selecting names for functions and variables, I sometimes use
library(sos) to look for existing conflicts with other packages. This
won't solve all the problems Janko mentioned, but it can help avoid
some. Spencer
On 10/1/2011 9:11 AM, Dominick Samperi wrote:
> On Tue, Aug 23, 2011 at 2:23 PM, Janko Thyson
> <janko.thyson.rstuff at googlemail.com> wrote:
>> aDear list,
>>
>> I'm aware of the fact that I posted on something related a while ago, but I
>> just can't sweat this off and would like to ask your for an opinion:
>>
>> The problem:
>> Namespaces are great, but they don't resolve certain conflicts regarding
>> name clashes. There are more and more people out there trying to come up
>> with their own R packages, which is great also! Yet, it becomes more and
>> more likely that programmers will choose identical names for their exported
>> functions and/or that they add functionality to existing function (i.e.
>> overwriting existing functions).
>> The whole process of which packages overwrite which functions is somewhat
>> obscure and in addition depends on their order in the search path. On the
>> other hand, it is not possible to use "namespace" functionality (i.e.
>> 'namespace::fun()'; also less efficient than direct call; see illustration
>> below) during early stages of the development process (i.e. the package is
>> not finished yet) as there is no namespace available yet.
>>
>> I know of at least two cases where such overwrites (I think it's called
>> masking, right?) led to some confusion at our chair:
>> 1) loading package forecast overwrites certain functions in stats which made
>> some code refactoring necessary
>> 2) loading package 'R.utils' followed by package 'roxygen' overwrites
>> 'parse.default()' which results in errors for something like
>> 'eval(parse(text="a<- 1"))' ; see illustration below)
>> And I'm sure the community could come up with lots more of such scenarios.
>>
>> Suggestions:
>> 1) In order to avoid name clashes/unintended overwrites, how about switching
>> to a coding paradigm that explicitly (and automatically) includes a
>> package's name in all its functions' names once code is turned into a real
>> package? E.g., getting used to "preemptively" type 'package_fun()' or
>> 'package.fun()' instead of just 'fun()'. Better to be save than sorry,
>> right? This could be realized pretty easily (see example below) and, IMHO,
>> would significantly increase transparency.
>> 2) In order to avoid intended (but for the user often pretty obscure)
>> overwrites of existing functions, we could use the same mechanism together
>> with the "rule": just don't provide any functions that overwrite existing
>> ones, rather prepend your version of that function with your package name
>> and leave it up to the user which version he wants to call.
> Experts from the Lisp-Stats community have added a number
> of functions to R that were inspired by Lisp, but one feature that apparently
> was not added is the shadowing feature of Common Lisp. Here the default
> behavior is not to permit packages to import conflicting names unless
> explicit shadowing directives are specified.
>
> Arguably a package is not intended to be used like a callable library,
> yet this is the way they are often used in the R context. This kind of
> shadowing tool might help to make this practice safer, at the expense
> of requiring the developer to specify explicit shadowing directives.
>
> Dominick
>
>> At the moment, all of this is probably not that big of a deal yet, but my
>> suggestion has more of a mid-term/long-term character.
>>
>> Below you find a little illustration. I'm probably asking too much, but it'd
>> be great if we could get a little discussion going on how to improve the way
>> of loading packages!
>>
>> Best regards and thanks for R and all it's packages!
>> Janko
>>
>> ################################################################################
>> # PROOF OF CONCEPT
>> ################################################################################
>>
>> # 1) PROBLEM
>> # IMHO, with the number of packages submitted to CRAN constantly increasing,
>> # over time we will be likely to see problems with respect to name clashes.
>> # The main reasons I see for this are the following:
>> # a) package developers picking identical names for their exported functions
>> # b) package developers overwriting base functions in order to add
>> functionality
>> # to existing functions
>> # c) ...
>> #
>> # This can create scenarios in which the user might not exactly know that
>> # he/she is using a 'modified' version of a specific function. More so, the
>> user
>> # needs to carefully read the description of each new package he plans
>> # to use in order to find out which functions are exported and which
>> existing
>> # functions might be overwritten. This in turn might imply that the user's
>> # existing code needs to be refactored (i.e. instead of using 'fun()' it
>> # might now be necessary to type 'namespace::fun()' to be sure that the
>> desired
>> # function is called).
>>
>> # 2) SUGGESTED SOLUTION
>> # That being said, why don't we switch to a 'preemptive' coding paradigm
>> # where the default way of calling functions includes the specification of
>> # its namespace? In principle, the functionality offered by
>> 'namespace::fun()'
>> # gets the job done.
>> # BUT:
>> # a) it is slower compared to the direct way of calling a function.
>> # (see illustration below).
>> # b) this option is not available througout the development process of a
>> package
>> # as there is no namespace yet and there's no way to emulate one. This in
>> # turn means that even though a package developer would buy into strictly
>> # using 'mypkg::fun()' throughout his package code, he can only do so at
>> the
>> # very final stage of the process RIGHT before turning his code into a
>> # working package (when he's absolutely sure everything is working as
>> planned).
>> # For debugging he would need to go back to using 'fun()'. Pretty
>> cumbersome.
>>
>> # So how about simply automatically prepending a given function's name with
>> # the package's name for each package that is build (e.g. 'pkg.fun()' or
>> # 'pkg_fun()')? In the end, this would just be a small change for new
>> packages
>> # without a significant decrease of performance and it could also be
>> realized
>> # at early stages of the development process (see illustration below).
>>
>> # 3) ILLUSTRATION
>>
>> # Example case where base function 'parse.default' is overwritten:
>> parse(text="a<- 5") # Works
>> require(R.utils)
>> require(roxygen)
>> parse(text="a<- 5") # Does not work anymore
>>
>> ################# START A NEW R SESSION BEFORE YOU CONTINUE
>> ####################
>>
>> # Inefficiency of 'namespace::fun()':
>> require(microbenchmark)
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(base::parse(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> # Can be made up by explicit assignment:
>> foo<- base::parse
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(foo(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> # Automatically prepend function names:
>> processNamespaces<- function(
>> do.global=FALSE,
>> do.verbose=FALSE,
>> .delim.name="_",
>> ...
>> ){
>> srch.list.0<- search()
>> srch.list<- gsub("package:", "", srch.list.0)
>> if(!do.global){
>> assign(".NS", new.env(), envir=.GlobalEnv)
>> }
>> out<- lapply(1:length(srch.list), function(x.pkg){
>> pkg<- srch.list[x.pkg]
>>
>> # SKIP LIST
>> if(pkg %in% c(".GlobalEnv", "Autoloads")){
>> return(NULL)
>> }
>> # /
>>
>> # TARGET ENVIR
>> if(!do.global){
>> # ADD PACKAGE TO .NS ENVIRONMENT
>> envir<- eval(substitute(
>> assign(PKG, new.env(), envir=.NS),
>> list(PKG=pkg)
>> ))
>> # /
>> # envir<- get(pkg, envir=.NS, inherits=FALSE)
>> envir.msg<- paste(".NS$", pkg, sep="")
>> } else {
>> envir<- .GlobalEnv
>> envir.msg<- ".GlobalEnv"
>> }
>> # /
>>
>> # PROCESS FUNCTIONS
>> cnt<- ls(pos=x.pkg)
>> out<- unlist(sapply(cnt, function(x.cnt){
>> value<- get(x.cnt, pos=x.pkg, inherits=FALSE)
>> obj.mod<- paste(pkg, x.cnt, sep=.delim.name)
>> if(!is.function(value)){
>> return(NULL)
>> }
>> if(do.verbose){
>> cat(paste("Assigning '", obj.mod, "' to '", envir.msg,
>> "'", sep=""), sep="\n")
>> }
>> eval(substitute(
>> assign(OBJ.MOD, value, envir=ENVIR),
>> list(
>> OBJ.MOD=obj.mod,
>> ENVIR=envir
>> )
>> ))
>> return(obj.mod)
>> }))
>> names(out)<- NULL
>> # /
>> return(out)
>> })
>> names(out)<- srch.list
>> return(out)
>> }
>>
>> # +++++
>>
>> funs<- processNamespaces(do.verbose=TRUE)
>> ls(.NS)
>> ls(.NS$base)
>> .NS$base$base_parse
>>
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(.NS$base$base_parse(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> #+++++
>>
>> funs<- processNamespaces(do.global=TRUE, do.verbose=TRUE)
>> base_parse
>>
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(base_parse(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
More information about the R-devel
mailing list