[Rd] Increase transparency: suggestion on how to avoid namespaces and/or unnecessary overwrites of existing functions
Dominick Samperi
djsamperi at gmail.com
Sat Oct 1 23:14:14 CEST 2011
On Sat, Oct 1, 2011 at 1:08 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
> On 11-08-23 2:23 PM, Janko Thyson wrote:
>>
>> aDear list,
>>
>> I'm aware of the fact that I posted on something related a while ago,
>> but I just can't sweat this off and would like to ask your for an opinion:
>>
>> The problem:
>> Namespaces are great, but they don't resolve certain conflicts regarding
>> name clashes. There are more and more people out there trying to come up
>> with their own R packages, which is great also! Yet, it becomes more and
>> more likely that programmers will choose identical names for their
>> exported functions and/or that they add functionality to existing
>> function (i.e. overwriting existing functions).
>> The whole process of which packages overwrite which functions is
>> somewhat obscure and in addition depends on their order in the search
>> path. On the other hand, it is not possible to use "namespace"
>> functionality (i.e. 'namespace::fun()'; also less efficient than direct
>> call; see illustration below) during early stages of the development
>> process (i.e. the package is not finished yet) as there is no namespace
>> available yet.
>>
>
> I agree there can be a problem, but I don't think it is necessarily as
> serious as you suggest. Even though there are more and more packages
> available, most people will still use roughly the same number of them. Just
> because CRAN has thousands of packages doesn't mean I use all of them at the
> same time.
>
>
>> I know of at least two cases where such overwrites (I think it's called
>> masking, right?) led to some confusion at our chair:
>> 1) loading package forecast overwrites certain functions in stats which
>> made some code refactoring necessary
>
> If your code had been in a package with a NAMESPACE, it would not have been
> affected by a user loading forecast. (If you start importing it, then of
> course it could cause masking problems.)
>
> You suggest above that users only put code into a package very late in the
> development process. The solution is, don't do that. Create a package
> early on, and use it through the majority of development time.
>
> You can leave the choice of exports until late by exporting everything;
> you'll still get the benefit of the more controlled name search from the
> beginning.
>
> You say you can't use "namespace::call()" until the namespace package has
> been written. But why would you want to? If the call is coming from the
> new package, objects in it will be used with first priority in resolving the
> call. You only need the :: notation when there are ambiguities in calls to
> external packages.
>
>
>> 2) loading package 'R.utils' followed by package 'roxygen' overwrites
>> 'parse.default()' which results in errors for something like
>> 'eval(parse(text="a<- 1"))' ; see illustration below)
>> And I'm sure the community could come up with lots more of such scenarios.
>>
>> Suggestions:
>> 1) In order to avoid name clashes/unintended overwrites, how about
>> switching to a coding paradigm that explicitly (and automatically)
>> includes a package's name in all its functions' names once code is
>> turned into a real package? E.g., getting used to "preemptively" type
>> 'package_fun()' or 'package.fun()' instead of just 'fun()'. Better to be
>> save than sorry, right? This could be realized pretty easily (see
>> example below) and, IMHO, would significantly increase transparency.
>
> I think long names with consistent prefixes are harder to read than short
> descriptive names. I think this would make code harder to read. For
> example, the first few lines of mean.default would change from
>
> if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
> warning("argument is not numeric or logical: returning NA")
> return(NA_real_)
> }
>
> to
>
> if (!base_is.numeric(x) && !base_is.complex(x) &&
> !base_is.logical(x)) {
> base_warning("argument is not numeric or logical: returning NA")
> return(base_NA_real_)
> }
>
>
>> 2) In order to avoid intended (but for the user often pretty obscure)
>> overwrites of existing functions, we could use the same mechanism
>> together with the "rule": just don't provide any functions that
>> overwrite existing ones, rather prepend your version of that function
>> with your package name and leave it up to the user which version he
>> wants to call.
>
> That seems like good advice.
>
> Duncan Murdoch
Except that namespace::foo should be assigned to another local
variable instead of using package::foo in a tight loop, because
repeated calls to "::" can introduce a significant performance
penalty. (This has been discussed in another thread.)
>>
>> At the moment, all of this is probably not that big of a deal yet, but
>> my suggestion has more of a mid-term/long-term character.
>>
>> Below you find a little illustration. I'm probably asking too much, but
>> it'd be great if we could get a little discussion going on how to
>> improve the way of loading packages!
>>
>> Best regards and thanks for R and all it's packages!
>> Janko
>>
>>
>> ################################################################################
>> # PROOF OF CONCEPT
>>
>> ################################################################################
>>
>> # 1) PROBLEM
>> # IMHO, with the number of packages submitted to CRAN constantly
>> increasing,
>> # over time we will be likely to see problems with respect to name
>> clashes.
>> # The main reasons I see for this are the following:
>> # a) package developers picking identical names for their exported
>> functions
>> # b) package developers overwriting base functions in order to add
>> functionality
>> # to existing functions
>> # c) ...
>> #
>> # This can create scenarios in which the user might not exactly know that
>> # he/she is using a 'modified' version of a specific function. More so,
>> the user
>> # needs to carefully read the description of each new package he plans
>> # to use in order to find out which functions are exported and which
>> existing
>> # functions might be overwritten. This in turn might imply that the user's
>> # existing code needs to be refactored (i.e. instead of using 'fun()' it
>> # might now be necessary to type 'namespace::fun()' to be sure that the
>> desired
>> # function is called).
>>
>> # 2) SUGGESTED SOLUTION
>> # That being said, why don't we switch to a 'preemptive' coding paradigm
>> # where the default way of calling functions includes the specification of
>> # its namespace? In principle, the functionality offered by
>> 'namespace::fun()'
>> # gets the job done.
>> # BUT:
>> # a) it is slower compared to the direct way of calling a function.
>> # (see illustration below).
>> # b) this option is not available througout the development process of a
>> package
>> # as there is no namespace yet and there's no way to emulate one.
>> This in
>> # turn means that even though a package developer would buy into
>> strictly
>> # using 'mypkg::fun()' throughout his package code, he can only do so
>> at the
>> # very final stage of the process RIGHT before turning his code into a
>> # working package (when he's absolutely sure everything is working as
>> planned).
>> # For debugging he would need to go back to using 'fun()'. Pretty
>> cumbersome.
>>
>> # So how about simply automatically prepending a given function's name
>> with
>> # the package's name for each package that is build (e.g. 'pkg.fun()' or
>> # 'pkg_fun()')? In the end, this would just be a small change for new
>> packages
>> # without a significant decrease of performance and it could also be
>> realized
>> # at early stages of the development process (see illustration below).
>>
>> # 3) ILLUSTRATION
>>
>> # Example case where base function 'parse.default' is overwritten:
>> parse(text="a<- 5") # Works
>> require(R.utils)
>> require(roxygen)
>> parse(text="a<- 5") # Does not work anymore
>>
>> ################# START A NEW R SESSION BEFORE YOU CONTINUE
>> ####################
>>
>> # Inefficiency of 'namespace::fun()':
>> require(microbenchmark)
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(base::parse(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> # Can be made up by explicit assignment:
>> foo<- base::parse
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(foo(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> # Automatically prepend function names:
>> processNamespaces<- function(
>> do.global=FALSE,
>> do.verbose=FALSE,
>> .delim.name="_",
>> ...
>> ){
>> srch.list.0<- search()
>> srch.list<- gsub("package:", "", srch.list.0)
>> if(!do.global){
>> assign(".NS", new.env(), envir=.GlobalEnv)
>> }
>> out<- lapply(1:length(srch.list), function(x.pkg){
>> pkg<- srch.list[x.pkg]
>>
>> # SKIP LIST
>> if(pkg %in% c(".GlobalEnv", "Autoloads")){
>> return(NULL)
>> }
>> # /
>>
>> # TARGET ENVIR
>> if(!do.global){
>> # ADD PACKAGE TO .NS ENVIRONMENT
>> envir<- eval(substitute(
>> assign(PKG, new.env(), envir=.NS),
>> list(PKG=pkg)
>> ))
>> # /
>> # envir<- get(pkg, envir=.NS, inherits=FALSE)
>> envir.msg<- paste(".NS$", pkg, sep="")
>> } else {
>> envir<- .GlobalEnv
>> envir.msg<- ".GlobalEnv"
>> }
>> # /
>>
>> # PROCESS FUNCTIONS
>> cnt<- ls(pos=x.pkg)
>> out<- unlist(sapply(cnt, function(x.cnt){
>> value<- get(x.cnt, pos=x.pkg, inherits=FALSE)
>> obj.mod<- paste(pkg, x.cnt, sep=.delim.name)
>> if(!is.function(value)){
>> return(NULL)
>> }
>> if(do.verbose){
>> cat(paste("Assigning '", obj.mod, "' to '", envir.msg,
>> "'", sep=""), sep="\n")
>> }
>> eval(substitute(
>> assign(OBJ.MOD, value, envir=ENVIR),
>> list(
>> OBJ.MOD=obj.mod,
>> ENVIR=envir
>> )
>> ))
>> return(obj.mod)
>> }))
>> names(out)<- NULL
>> # /
>> return(out)
>> })
>> names(out)<- srch.list
>> return(out)
>> }
>>
>> # +++++
>>
>> funs<- processNamespaces(do.verbose=TRUE)
>> ls(.NS)
>> ls(.NS$base)
>> .NS$base$base_parse
>>
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(.NS$base$base_parse(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> #+++++
>>
>> funs<- processNamespaces(do.global=TRUE, do.verbose=TRUE)
>> base_parse
>>
>> res.a<- microbenchmark(eval(parse(text="a<- 5")))
>> res.b<- microbenchmark(eval(base_parse(text="a<- 5")))
>> median(res.a$time)/median(res.b$time)
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list