[BioC] IRanges/List oddity: do.call of `c` on a list of IRangesList returns "list" only when the list is named
Hervé Pagès
hpages at fhcrc.org
Fri Dec 14 01:15:01 CET 2012
On 12/13/2012 12:24 PM, Cook, Malcolm wrote:
> Thanks for digging into this, Herve, Michael.
>
> Herve, I really appreciate your following up on R-devel, such as you
> recently did that got mapply ‘fixed’ to work natively with Bioc’s List
> and friends (c.f. http://developer.r-project.org/blosxom.cgi/R-devel/NEWS)
>
> I don’t think re-defining c as a generic in BioConductor is a good
> workaround, for the reasons you mentioned Herve. The issue will just
> crop up again with someone else’s non BioC S4 class structure.
>
> It really is not a BioConductor issue at all.
>
> If this can also be kicked upstream, that would serve others as well.
Glups, I wrote a long answer about the pros and cons of putting stuff
in BiocGenerics vs trying to push it into mainstream R. I was about to
press the Send button but, before doing so, decided to have a quick
look at the source of the methods package (following Michael suggestion)
just to confirm my feeling that this would be a tough one, so tough
that my previous workaround would suddenly sound much more appealing.
I was psychologically and emotionally prepared to have a rough time,
but, surprisingly, I didn't. Here is the patch:
hpages at thinkpad:~/biocprojects/c_implicit_generic/R-devel$ svn diff
Index: src/library/methods/R/BasicFunsList.R
===================================================================
--- src/library/methods/R/BasicFunsList.R (revision 61310)
+++ src/library/methods/R/BasicFunsList.R (working copy)
@@ -46,7 +46,7 @@
, "%*%" = function(x, y) standardGeneric("%*%")
, "xtfrm" = function(x) standardGeneric("xtfrm")
### these have a different arglist from the primitives
-, "c" = function(x, ..., recursive = FALSE) standardGeneric("c")
+, "c" = function(..., recursive = FALSE) standardGeneric("c")
, "all" = function(x, ..., na.rm = FALSE) standardGeneric("all")
, "any" = function(x, ..., na.rm = FALSE) standardGeneric("any")
, "sum" = function(x, ..., na.rm = FALSE) standardGeneric("sum")
Yes, a 1-liner! I did very little testing but it seems to work fine :-)
I'll do more testing before I send this to R-devel. Thanks for the
encouragements.
H.
>
> Thoughts?
>
> ~Malcolm
>
> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com]
> *Sent:* Thursday, December 13, 2012 11:13 AM
> *To:* Hervé Pagès
> *Cc:* Cook, Malcolm; Michael Lawrence; bioconductor at r-project.org
> *Subject:* Re: [BioC] IRanges/List oddity: do.call of `c` on a list of
> IRangesList returns "list" only when the list is named
>
> Probably better to bring this issue to the attention of John Chambers.
> Since he's invited us to start hacking on the methods package, this
> might be a good opportunity smooth out some of these rough edges.
>
>
> Michael
>
> On Wed, Dec 12, 2012 at 6:46 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi Malcolm,
>
> I'm not sure what the reasons are for the current behaviour
> of the c() generic, if they're just historical, or if there
> is something deeper, or...
>
> My view on the "primitive" status of a function is that it should
> be an implementation detail, maybe an important one, but a
> detail anyway in the sense that being implemented as a .Primitive
> or an .Internal or just in plain R should not affect the semantic
> of a function. Interestingly there is a short comment in ?.Primitive
> suggesting that people's code should not depend on knowing which
> functions are primitive because this does change as R evolves.
> Unfortunately the reality is very different: there are situations
> where you definitely need to know that something is a primitive,
> just because argument passing (and consequently method dispatch)
> works differently.
>
> On a more positive note, I found a hack that allows c() to dispatch
> on ...:
>
> setGeneric("c", signature="...",
> function(..., recursive=FALSE)
> standardGeneric("c"),
> useAsDefault=function(..., recursive=FALSE)
> base::c(..., recursive=recursive)
> )
>
> Then:
>
> setClass("A", representation(aa="integer"))
>
> setMethod("c", "A",
> function(..., recursive=FALSE)
> {
> args <- list(...)
> ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE)
> new("A", aa=ans_aa)
> }
> )
>
> > a1 <- new("A", aa=1:3)
> > a2 <- new("A", aa=22:25)
>
> > c(a1, a2)
> An object of class "A"
> Slot "aa":
> [1] 1 2 3 22 23 24 25
>
> > c(a1, x=a2)
> An object of class "A"
> Slot "aa":
> [1] 1 2 3 22 23 24 25
>
> > c(A=a1, B=a2)
> An object of class "A"
> Slot "aa":
> [1] 1 2 3 22 23 24 25
>
> Overriding base::c() with our own c() is pretty invasive though and
> I didn't test it enough to guarantee that it doesn't break or slowdown
> things.
>
> Also one important thing to note is that this signature doesn't
> allow specific methods to implement extra arguments (like the "c"
> method for GenomicRanges does), which kind of makes sense because
> the generic function is putting named args that are not named
> 'recursive' in ..., and dispatches on them. The same restriction
> applies to the cbind() and rbind() generics:
>
> > setMethod("cbind", "A", function(..., deparse.level=1,
> my.toggle=FALSE) NULL)
> Creating a generic function for ‘cbind’ from package ‘base’ in the
> global environment
> in method for ‘cbind’ with signature ‘"A"’: no definition for class “A”
> Error in rematchDefinition(definition, fdef, mnames, fnames, signature) :
> arguments (deparse.level) after '...' in the generic must appear in
> the method, in the same place at the end of the argument list
>
> So some of the "c" methods would need to be revisited.
>
> Anyway, would need serious testing before adding this generic to
> BiocGenerics. Is it worth it?
>
> Cheers,
> H.
>
>
>
>
> On 12/03/2012 12:11 PM, Cook, Malcolm wrote:
>
> Steve, Michael, Herve, all
>
> As always, “illuminating”.
>
> And, as often, frustrating.
>
> I am clear how unname serves as a workaround for my current purpose.
> So, I can proceed.
>
> But, I remain unclear if this (to me, odd) behavior of `base::c` is
> desirable or justifiable in any sense of the word. Is this informed by
> a rational language design, or, as Mike suggests, the result of layering
> on of OO design onto a functional base.
>
> In your opinion, do you/we think this issue should this issue be raised
> on R-devel? Or is it a “waste of time”?
>
> Thanks for your thoughts/help.
>
> ~Malcolm
>
> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>]
> *Sent:* Monday, December 03, 2012 11:31 AM
> *To:* Hervé Pagès
> *Cc:* Cook, Malcolm; bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>
> *Subject:* Re: [BioC] IRanges/List oddity: do.call of `c` on a list of
>
>
> IRangesList returns "list" only when the list is named
>
> On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
> Hi Malcolm,
>
> The problem you are describing can be reproduced by calling c()
> directly on S4 objects.
>
> * With unnamed arguments:
>
> > c(IRanges(), IRanges())
> IRanges of length 0
>
> > c(Rle(), Rle())
> logical-Rle of length 0 with 0 runs
> Lengths:
> Values :
>
> * With named arguments:
>
> > c(a=IRanges(),b=IRanges())
> $a
> IRanges of length 0
>
> $b
> IRanges of length 0
>
> > c(a=Rle(), b=Rle())
> $a
> logical-Rle of length 0 with 0 runs
> Lengths:
> Values :
>
> $b
> logical-Rle of length 0 with 0 runs
> Lengths:
> Values :
>
> This statement (found in man page for base::c()) is showing what the
> root of the problem is:
>
> S4 methods:
>
> This function is S4 generic, but with argument list ‘(x, ...,
> recursive = FALSE)’.
>
> Note that, to make things a little bit more confusing, it's not totally
> accurate that c() is an S4 generic, at least not on a fresh session:
>
> > isGeneric("c")
> [1] FALSE
>
> So my understanding of the above statement is that c() will
> automatically be turned into an S4 generic at the moment you try
> to define an S4 method for it, and, for obscure reasons that I'm not
> sure I understand, the argument list used in the definition of this
> S4 method must start with 'x'. The consequence of all this is that
> dispatch will happen on 'x' so if named arguments are passed with
> a name that is not 'x', dispatch will fail and the default method
> (which is base::c()) will be called :-b
>
> This explains why things work as expected in the following situations:
>
> > c(IRanges(), b=IRanges())
> IRanges of length 0
>
> > c(a=IRanges(), IRanges())
> IRanges of length 0
>
> > c(a=IRanges(), x=IRanges())
> IRanges of length 0
>
> But when all the arguments are named with names != 'x', then nothing
> is passed to 'x' and dispatch fails.
>
> I didn't have much luck so far with my attempts to work around this:
>
> 1. Trying to change the signature of the c() generic:
>
> > setGeneric("c", signature="...")
> Error in setGeneric("c", signature = "...") :
> ‘c’ is a primitive function; methods can be defined, but
> the generic function is implicit, and cannot be changed.
>
> 2. Trying to dispatch on "missing" or "ANY":
>
> > setMethod("c", "missing", function(x, ..., recursive=FALSE)
> "YES!")
> Error in setMethod("c", "missing", function(x, ..., recursive =
> FALSE) "YES!") :
> the method for function ‘c’ and signature x="missing" is sealed
> and cannot be re-defined
>
> > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!")
> Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE)
> "YES!") :
> the method for function ‘c’ and signature x="ANY" is sealed and
> cannot be re-defined
>
> With old versions of R dispatch on ... was not possible i.e. ... was not
> allowed to be in the signature of the generic. This was changed in
> recent versions of R and we're already using this new feature for a
> few S4 generics defined in BiocGenerics e.g. for cbind() and rbind():
>
> > library(BiocGenerics)
> > rbind
> standardGeneric for "rbind" defined from package "BiocGenerics"
>
> function (..., deparse.level = 1)
> standardGeneric("rbind")
> <environment: 0x29b96b0>
> Methods may be defined for arguments: ...
> Use showMethods("rbind") for currently available ones.
>
> And dispatch works as expected, with or without named arguments:
>
> > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23))
> DataFrame with 6 rows and 2 columns
> X Y
> <integer> <integer>
> 1 1 11
> 2 2 12
> 3 3 13
> 4 1 21
> 5 2 22
> 6 3 23
>
> > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23))
> DataFrame with 6 rows and 2 columns
> X Y
> <integer> <integer>
> 1 1 11
> 2 2 12
> 3 3 13
> 4 1 21
> 5 2 22
> 6 3 23
>
> So I wonder if the weird behavior of c() is still justified.
>
> Comments/suggestions to address this are welcome.
>
>
>
> The issue is that (unlike 'rbind') 'c' is a primitive and dispatch for
> primitives is hard-coded in C. C-level dispatch is a simplified variant
> of the R implementation, so I'm guessing it does not work with "...".
>
> Btw, you can get a peak at the 'c' generic with:
> > getGeneric("c")
> standardGeneric for "c" defined from package "base"
>
> function (x, ..., recursive = FALSE)
> standardGeneric("c", .Primitive("c"))
> <bytecode: 0x382af20>
> <environment: 0x34d6878>
> Methods may be defined for arguments: x, recursive
> Use showMethods("c") for currently available ones.
>
> Michael
>
> Thanks,
> H.
>
>
>
>
> On 11/30/2012 11:56 AM, Cook, Malcolm wrote:
>
> Hi,
>
> The following shows that do.call of `c` on a list of IRangesList
> returns "list" only when the list is named.
>
> library(IRanges)
> example(IRangesList)
> class(x)
>
> [1] "CompressedIRangesList"
> attr(,"package")
> [1] "IRanges"
>
> class(do.call(c,list(x1=x,x2=x)))
>
> [1] "list"
>
> I am confused this.
>
> I would not expect the fact that the list is named to have any
> impact on the result.
>
> But, look, omitting the list names the class is now an IRangesList
>
> class(do.call(c,list(x,x)))
>
> [1] "CompressedIRangesList"
> attr(,"package")
> [1] "IRanges"
>
> class(c(x,x))
>
> [1] "CompressedIRangesList"
> attr(,"package")
> [1] "IRanges"
>
> A 'workaround' is to unname the list, as demonstrated:
>
> class(do.call(c,unname(list(x1=x,x2=x))))
>
> [1] "CompressedIRangesList"
> attr(,"package")
> [1] "IRanges"
>
> But, why does having a 'names' attribute effect the behavior of
> do.calling `c` so much as to change the class returned?
>
>
> Thanks for your help/education.....
>
> Malcolm Cook
> Computational Biology - Stowers Institute for Medical Research
>
> sessionInfo()
>
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] IRanges_1.16.4 BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0
> Biostrings_2.26.2 DBI_0.2-5
> GenomicFeatures_1.10.1 GenomicRanges_1.10.5 RCurl_1.95-3
> RSQLite_0.11.2 Rsamtools_1.10.2 XML_3.95-0.1
> biomaRt_2.14.0 bitops_1.0-4.2 colorspace_1.2-0
> data.table_1.8.6 functional_0.1 graph_1.36.1
> gtools_2.7.0 parallel_2.15.1
> rtracklayer_1.18.1 stats4_2.15.1 tools_2.15.1
> zlibbioc_1.4.0
>
>
> _______________________________________________
> Bioconductor mailing list
>
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>
>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>
>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list