[Bioc-devel] R6 class v.s. S4 class

Michael Lawrence lawrence.michael at gene.com
Fri Oct 20 18:39:24 CEST 2017


On Thu, Oct 19, 2017 at 9:23 PM, Chunlei Wu <cwu at scripps.edu> wrote:

> Thank you all for the feedback. Just to give some extra context, here we
> have the Python and Javascript versions of the biothings_client:
>
>
> https://github.com/biothings/biothings_client.py
>
> https://github.com/biothings/biothings_client.js
>
>
> And here is the work-in-progress R client:
>
>
> https://github.com/biothings/biothings_client.R
>
>
> You can find some examples from the README and the test code to see how
> the client works in Python and Javascript.
>
>
> One of the nice features of both Python and JS clients is it allows users
> to use the same client instance for any new "BioThings" API in the future,
> which can be created by another user, not just from us. In this case, one
> can do this to work with a new API in python client:
>
>
> from biothings_client import get_client
>
> mything_client = get_client("mything", url="http://example.com/v1/api")
>  # could have some extra parameters
>
> mything_client.query(...)
>
> mything_client.get_mything(...)
>
> ...
>
>
> As the developer of all these three biothings_clients, we, of course, like
> to keep the same pattern for R, and R6 looks the closest to me. But it
> looks like, from R users' perspective, this is not a popular pattern to use
>

Yes, there will probably be way more users of R wanting to use BioThings
than BioThings users wanting to use R.


> .  With your suggestion, I think it can work this way in R:
>
>
> library(biothings)
>
> gene_client = BioThingsClient('gene')     # a gene client with a preset
> config
>
> queryBioThings(gene_client, "CDK2")    # whether we should keep client as
> the first argv, that's still TBD, based on the previous pipe comment
>
>
> mything_client = BioThingsClient('mything', url= "
> http://example.com/v1/api")
>
> queryBioThings(mything_client, "something')
>
>
>
> Another thing I should mention, in Python client, each client has these
> methods:
>
>
> gene_client.getgene
>
> gene_client.getgenes
>
> gene_client.query
>
> gene_client.querymany
>
> gene_client.metdata
>
>
> Then in R, we will have to create these generic methods (hope this is the
> right term):
>
>
> getBioThing(mything_client, ...)
>
> getBioThings
>

As Herve points out, R users will expect queries to be vectorized
implicitly. queryBioThings() or whatever should probably return a tabular
structure describing the things. There is no need for distinguishing
singular and plural.

>
> queryBioThings
>
> queryManyBioThings
>
> BioThingsMetadata
>
>
> I personally still like the Python/JS pattern, as you can have client
> specific name like "getgene", "getgenes", instead of the generic
> getBioThing and getBioThings name. Plus that users can just call
> "gene_client" part as "gc" or whatever, it just has much less to type :-)
> in the code. In R S4 case, the function name has to be more verbose because
> they are global.
>
>
>
There seems to be a misconception here. S4 has two types of classes,
conventional value classes, and reference classes. The reference classes
have the same syntax as the R6 classes. R6 is mostly a stripped down
version of S4 reference classes. In this particular case, R is sufficiently
flexible that it would be easy to support the reference class syntax on
ordinary value classes. So you could use the reference class syntax, but I
wouldn't recommend it, for the aforementioned reasons. Moreover, be careful
about carrying over notions from Python and JS into R. R is unique in
fundamental ways.

Does this sound good to the group? Any more suggestions?
>
>
> Chunlei
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
> *From:* Michael Lawrence <lawrence.michael at gene.com>
> *Sent:* Thursday, October 19, 2017 8:32 PM
> *To:* Martin Morgan
> *Cc:* Charles Plessy; bioc-devel at r-project.org; Chunlei Wu
> *Subject:* Re: [Bioc-devel] R6 class v.s. S4 class
>
> API discoverability is a big problem in languages with a functional
> syntax. Namespaces are verbose, but they do provide for constrained
> autocompletion. Prefixing all symbols with an abbreviation like "bt_" seems
> too adhoc to me, but it is common practice. Explicitly querying for methods
> takes the user out of the flow.
>
> One could imagine an IDE showing available methods in the tooltip of
> function symbols.
>
> I guess an IDE could support autocompeting on  "(object)" or "(object,",
> where <tab> would display generics with applicable methods and fill in the
> name in front of the "(". Not very intuitive though.
>
> By simplifying our APIs we make discoverability less of an issue, because
> they are easily listed on cheat sheets and memorized.
>
> I wonder if there are ideas to steal from Julia.
>
> On Thu, Oct 19, 2017 at 7:36 PM, Martin Morgan <
> martin.morgan at roswellpark.org> wrote:
>
>> On 10/19/2017 09:24 PM, Charles Plessy wrote:
>>
>>> (Just sharing my thoughts as those days I am spending quite
>>> some time preparing the upgrade of a Bioconductor package).
>>>
>>> Le Fri, Oct 20, 2017 at 12:50:48AM +0000, Ryan Thompson a écrit :
>>>
>>>>
>>>> gene_client <- BioThingsClient("gene")
>>>> query("CDK2", client=gene_client)
>>>>
>>>
>>> In addition, since the piping operator (%>%) of dplyr and magrittr is
>>> gaining traction, I would recommend to carefully consider which will be
>>> the first argument of the function:
>>>
>>> With the client as first argument, one can then write things like:
>>>
>>>      gene_client %>% query("CDK2")  # similar to query(gene_client,
>>> "CDK2")
>>>
>>
>> The Bioconductor convention would use S4 objects with CamelCase
>> constructors.
>>
>>   geneClient = BioThingsGeneClient()  ## or just GeneClient()
>>
>> I agree with enabling the use of pipe, and think the generic + methods
>> should have signature where the first argument is the client rather than
>> the pattern against which the query occurs. There is to some extent an
>> argument for name-mangling in the generic (other knowledgeable people
>> disagree) so that one is free to implement contracts unique to the package
>> in question, and avoid conflicts with other generics with identical names
>> in different packages ( AnnotationDbi::select() / dplyr::select()).
>>
>>   setGeneric(
>>     "btQuery",
>>     function(x, query, ...) standardGeneric("btQuery")
>>   )
>>
>>   setMethod(
>>     "btQuery", "GeneClient",
>>     function(x, query)
>>   {
>>     ## implementation
>>   })
>>
>>   btQuery(geneClient, "CDK2")  ## maybe btquery(...)
>>
>> Yes one could BioThings::query(), or semanticallyInformativeAlterntaiveToQuery(),
>> but these seem cumbersome to me, and the first at least has rough edges
>> (that of course should be fixed...), e.g.,
>>
>>   > methods(AnnotationHub::query)
>>   Error in .S3methods(generic.function, class, parent.frame()) :
>>     no function 'AnnotationHub::query' is visible
>>
>> I think Michael is arguing for something like plain-old-functions (and
>> the original examples and problems of multiplying methods seemed somehow to
>> be plain old functions rather than S4 generics and methods?)
>>
>>   geneQuery <- function(x, query) ...
>>
>> A down side is that one cannot discover programatically what one can do
>> with a GeneClient object (if it were a method, one could ask for
>> methods(class=class(geneClient))); as a developer one also needs to
>> validate the incoming argument, which requires a certain but not
>> unsurmountable discipline.
>>
>> Michael didn't mention it, but these slides of his are relevant
>>
>>
>> https://bioconductor.org/help/course-materials/2017/BioC2017
>> /DDay/BOF/usability.pdf
>>
>> One other lesson from the annotation world is to think carefully about
>> the structure of the return, in particular thinking about 1:1 versus 1:many
>> mappings between vector-valued 'pattern='. While it's tempting to return
>> say a character vector or named list, probably one wants these days to take
>> the lessons of tidy data and return a data.frame-like (e.g., DataFrame(),
>> but maybe that's not 'necessary'; nothing wrong with a tibble, but a
>> data.table is not likely necessary or particularly advised [because of the
>> novel syntax and reference semantics]) object where the first column is the
>> query and the second and subsequent columns the result of the query; one
>> wants to pay particular attention to dealing with 1:0 and 1:many mappings
>> in ways that do not confuse users; some use cases (e.g., adding annotations
>> to the rowData() of SummarizedExperiment) are really facilitated by a 1:1
>> mapping between query and response.
>>
>> Martin
>>
>>
>>> With the gene symbol as first argument:
>>>
>>>      "CDK2" %>% query(gene_client)  # similar to query("CDK2",
>>> gene_client)
>>>
>>> If gene symbols may come as output from other commands and the query
>>> function is able to work smartly with a vector of gene symbols as input,
>>> then the second pattern might be useful.  Otherwise the first pattern
>>> probably makes more sense.
>>>
>>> See https://cran.r-project.org/web/packages/magrittr/vignettes/m
>>> agrittr.html for details.
>>>
>>> (Note however that the piped and non-piped functions are not exactly
>>> equivalent, and that piped commands can be harder to debug; therefore
>>> it may be better to only use them in interactive sessions.)
>>>
>>> Have a nice day,
>>>
>>>
>>
>> This email message may contain legally privileged and/or...{{dropped:2}}
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list