[Bioc-devel] R6 class v.s. S4 class

Michael Lawrence lawrence.michael at gene.com
Fri Oct 20 05:32:40 CEST 2017


API discoverability is a big problem in languages with a functional syntax.
Namespaces are verbose, but they do provide for constrained autocompletion.
Prefixing all symbols with an abbreviation like "bt_" seems too adhoc to
me, but it is common practice. Explicitly querying for methods takes the
user out of the flow.

One could imagine an IDE showing available methods in the tooltip of
function symbols.

I guess an IDE could support autocompeting on  "(object)" or "(object,",
where <tab> would display generics with applicable methods and fill in the
name in front of the "(". Not very intuitive though.

By simplifying our APIs we make discoverability less of an issue, because
they are easily listed on cheat sheets and memorized.

I wonder if there are ideas to steal from Julia.

On Thu, Oct 19, 2017 at 7:36 PM, Martin Morgan <
martin.morgan at roswellpark.org> wrote:

> On 10/19/2017 09:24 PM, Charles Plessy wrote:
>
>> (Just sharing my thoughts as those days I am spending quite
>> some time preparing the upgrade of a Bioconductor package).
>>
>> Le Fri, Oct 20, 2017 at 12:50:48AM +0000, Ryan Thompson a écrit :
>>
>>>
>>> gene_client <- BioThingsClient("gene")
>>> query("CDK2", client=gene_client)
>>>
>>
>> In addition, since the piping operator (%>%) of dplyr and magrittr is
>> gaining traction, I would recommend to carefully consider which will be
>> the first argument of the function:
>>
>> With the client as first argument, one can then write things like:
>>
>>      gene_client %>% query("CDK2")  # similar to query(gene_client,
>> "CDK2")
>>
>
> The Bioconductor convention would use S4 objects with CamelCase
> constructors.
>
>   geneClient = BioThingsGeneClient()  ## or just GeneClient()
>
> I agree with enabling the use of pipe, and think the generic + methods
> should have signature where the first argument is the client rather than
> the pattern against which the query occurs. There is to some extent an
> argument for name-mangling in the generic (other knowledgeable people
> disagree) so that one is free to implement contracts unique to the package
> in question, and avoid conflicts with other generics with identical names
> in different packages ( AnnotationDbi::select() / dplyr::select()).
>
>   setGeneric(
>     "btQuery",
>     function(x, query, ...) standardGeneric("btQuery")
>   )
>
>   setMethod(
>     "btQuery", "GeneClient",
>     function(x, query)
>   {
>     ## implementation
>   })
>
>   btQuery(geneClient, "CDK2")  ## maybe btquery(...)
>
> Yes one could BioThings::query(), or semanticallyInformativeAlterntaiveToQuery(),
> but these seem cumbersome to me, and the first at least has rough edges
> (that of course should be fixed...), e.g.,
>
>   > methods(AnnotationHub::query)
>   Error in .S3methods(generic.function, class, parent.frame()) :
>     no function 'AnnotationHub::query' is visible
>
> I think Michael is arguing for something like plain-old-functions (and the
> original examples and problems of multiplying methods seemed somehow to be
> plain old functions rather than S4 generics and methods?)
>
>   geneQuery <- function(x, query) ...
>
> A down side is that one cannot discover programatically what one can do
> with a GeneClient object (if it were a method, one could ask for
> methods(class=class(geneClient))); as a developer one also needs to
> validate the incoming argument, which requires a certain but not
> unsurmountable discipline.
>
> Michael didn't mention it, but these slides of his are relevant
>
>
> https://bioconductor.org/help/course-materials/2017/BioC2017
> /DDay/BOF/usability.pdf
>
> One other lesson from the annotation world is to think carefully about the
> structure of the return, in particular thinking about 1:1 versus 1:many
> mappings between vector-valued 'pattern='. While it's tempting to return
> say a character vector or named list, probably one wants these days to take
> the lessons of tidy data and return a data.frame-like (e.g., DataFrame(),
> but maybe that's not 'necessary'; nothing wrong with a tibble, but a
> data.table is not likely necessary or particularly advised [because of the
> novel syntax and reference semantics]) object where the first column is the
> query and the second and subsequent columns the result of the query; one
> wants to pay particular attention to dealing with 1:0 and 1:many mappings
> in ways that do not confuse users; some use cases (e.g., adding annotations
> to the rowData() of SummarizedExperiment) are really facilitated by a 1:1
> mapping between query and response.
>
> Martin
>
>
>> With the gene symbol as first argument:
>>
>>      "CDK2" %>% query(gene_client)  # similar to query("CDK2",
>> gene_client)
>>
>> If gene symbols may come as output from other commands and the query
>> function is able to work smartly with a vector of gene symbols as input,
>> then the second pattern might be useful.  Otherwise the first pattern
>> probably makes more sense.
>>
>> See https://cran.r-project.org/web/packages/magrittr/vignettes/
>> magrittr.html for details.
>>
>> (Note however that the piped and non-piped functions are not exactly
>> equivalent, and that piped commands can be harder to debug; therefore
>> it may be better to only use them in interactive sessions.)
>>
>> Have a nice day,
>>
>>
>
> This email message may contain legally privileged and/or...{{dropped:2}}
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list