[Rd] conflicted: an alternative conflict resolution strategy

Gabe Becker becker@g@be @ending from gene@com
Fri Aug 24 21:37:40 CEST 2018


Hadley,

Overall seems like a cool and potentially really idea. I do have some
thoughts/feedback, which I've put in-line below

On Thu, Aug 23, 2018 at 11:31 AM, Hadley Wickham <h.wickham using gmail.com>
wrote:

>
> <snip>
>

> conflicted applies a few heuristics to minimise false positives (at the
> cost of introducing a few false negatives). The overarching goal is to
> ensure that code behaves identically regardless of the order in which
> packages are attached.
>
> -   A number of packages provide a function that appears to conflict
>     with a function in a base package, but they follow the superset
>     principle (i.e. they only extend the API, as explained to me by
>     Hervè Pages).
>
>     conflicted assumes that packages adhere to the superset principle,
>     which appears to be true in most of the cases that I’ve seen.


It seems that you may be able to strengthen this heuristic from a blanket
assumption to something more narrowly targeted by looking for one or more
of the following to confirm likely-superset adherence

   1. matching or purely extending formals (ie all the named arguments of
   base::fun match including order, and there are new arguments in pkg::fun
   only if base::fun takes ...)
   2. explicit call to  base::fun in the body of pkg::fun
   3. UseMethod(funname) and at least one provided S3 method calls base::fun
   4. S4 generic creation using fun or base::fun as the seeding/default
   method body or called from at least one method



> For
>     example, the lubridate package provides `as.difftime()` and `date()`
>     which extend the behaviour of base functions, and provides S4
>     generics for the set operators.
>
>         conflict_scout(c("lubridate", "base"))
>         #> 5 conflicts:
>         #> * `as.difftime`: [lubridate]
>         #> * `date`       : [lubridate]
>         #> * `intersect`  : [lubridate]
>         #> * `setdiff`    : [lubridate]
>         #> * `union`      : [lubridate]
>
>     There are two popular functions that don’t adhere to this principle:
>     `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
>     special cases so they correctly generate conflicts. (I sure wish I’d
>     know about the subset principle when creating dplyr!)
>
>         conflict_scout(c("dplyr", "stats"))
>         #> 2 conflicts:
>         #> * `filter`: dplyr, stats
>         #> * `lag`   : dplyr, stats
>
> -   Deprecated functions should never win a conflict, so conflicted
>     checks for use of `.Deprecated()`. This rule is very useful when
>     moving functions from one package to another. For example, many
>     devtools functions were moved to usethis, and conflicted ensures
>     that you always get the non-deprecated version, regardess of package
>     attach order:
>

I would completely believe this rule is useful for refactoring as you
describe, but that is the "same function" case. For an end-user in the
"different function same symbol" case it's not at all clear to me that the
deprecated function should always win.

People sometimes use deprecated functions. It's not great, and eventually
they'll need to fix that for any given case, but imagine if you deprecated
the filter verb in dplyr (I know this will never happen, but I think it's
illustrative none the less).

Consider a piece of code someone wrote before this hypothetical deprecation
of filter. The fact that it's now deprecated certainly doesn't mean that
they secretly wanted stats::filter all along, right? Conflicted acting as
if it does will lead to them getting the exact kind of error you're looking
to protect them from, and with even less ability to understand why because
they are already doing "The right thing" to protect themselves by using
conflicted in the first place...


> Finally, as mentioned above, the user can declare preferences:
>
>     conflict_prefer("select", "MASS")
>     #> [conflicted] Will prefer MASS::select over any other package
>     conflict_scout(c("dplyr", "MASS"))
>     #> 1 conflict:
>     #> * `select`: [MASS]
>
>
I deeply worry about people putting this kind of thing, or even just
library(conflicted), in their .Rprofile and thus making their scripts
*substantially* less reproducible. Is that a consequence you have thought
about to this kind of functionality?

Best,
~G


> I’d love to hear what people think about the general idea, and if there
> are any obviously missing pieces.
>
> Thanks!
>
> Hadley
>
>
> --
> http://hadley.nz
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
Best,
~G

-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

	[[alternative HTML version deleted]]



More information about the R-devel mailing list