[Bioc-devel] R6 class v.s. S4 class

Michael Lawrence lawrence.michael at gene.com
Fri Oct 20 04:10:38 CEST 2017


On Thu, Oct 19, 2017 at 5:25 PM, Chunlei Wu <cwu at scripps.edu> wrote:

> Hello BioC-dev group,
>
>
>            We are working on a new R package right now and plan to submit
> it to Bioconductor soon.  It's a unified R client for the collection of
> BioThings APIs (http://biothings.io). Using R6 class, it makes a lot
> sense to me as I'm coming from Python's OOP experience. It will be used
> like this:
>
>
> library(biothings)
>
> gene_client <- BioThingsR6$new("gene")
> gene_client$query("CDK2")
>
>
> variant_client <- BioThingsR6$new("variant")
> gene_client$query("dbsnp.rsid:rs1000")
>
> Each "client" above is corresponding to a specific BioThings API, e.g. one
> for gene, and one for variant. And we will have more "clients" as we are
> expanding the number of BioThings API. The same R code should work with the
> future APIs.
>
> But if we use the traditional S4 class, it will be awkward as all
> functions/methods are not "namespaced", we will need to define new
> functions for each additional API. Something like this:
>
> library(biothings)
> geneQuery("CDK2")
> variantQuery("dbsnp.rsid:rs1000")
>
>
If we ignore the mutability aspect, the difference here is only syntax.

gene_client$query("CDK2") <-> query(gene_client, "CDK2")
variant_client$query("dbsnp.rsid:rs1000") <-> query(variant_client,
"dbsnp.rsid:rs1000")

The problem in both APIs is that "query" is too generic; it's semantically
poor. You're depending on the user choosing an informative name for the
client in order to know what type of thing is being returned. Using
explicitly named functions helps to prevents this. Presumably BioThings
already has some sort of schema for each "thing", and the interface should
correspond.

It's true that the functional syntax has the potential for symbol
collisions, but that's what namespaces are for. If it all possible though,
use the collision to your advantage and set methods on existing generics.
For example, genes() and transcripts() from GenomicFeatures are probably
relevant.

But the mutable/reference semantics do matter, unless this is a read-only
API. Even if it were, the message-passing syntax (in R anyway) is
unfamiliar to virtually every R user. But even if you're not convinced by
all of that, at least use S4 reference classes, not R6, so that there is
some level of integration, and you can take advantage of all the other S4
features.

I also want to mention that "query" is not the only method for each API
> client, there will be several other methods for each client. It will
> quickly make the function names messy if we go with the S4 option.
>
> Anyway, we think we like R6 class better, but just want to get some
> feedback here if the usage pattern using R6 class has been well-accepted in
> the R community. Will the users feel cumbersome if they have to instantiate
> the class first and then make the function calls? The majority of the
> existing BioC package are indeed S4 class based, which makes us feel
> hesitated.
>
> Thanks,
>
> Chunlei
>
>
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list