[BioC] IRanges/List oddity: do.call of `c` on a list of IRangesList returns "list" only when the list is named
Cook, Malcolm
MEC at stowers.org
Fri Dec 14 16:45:36 CET 2012
Herve,
Excellent news! I look forward to seeing your contrib on R-devel wend its way world-wide.
Beaucoups woohoos and mucho kudos to you,
;)
~Malcolm
.-----Original Message-----
.From: Hervé Pagès [mailto:hpages at fhcrc.org]
.Sent: Thursday, December 13, 2012 6:15 PM
.To: Cook, Malcolm
.Cc: 'Michael Lawrence'; 'bioconductor at r-project.org'
.Subject: Re: [BioC] IRanges/List oddity: do.call of `c` on a list of IRangesList returns "list" only when the list is named
.
.On 12/13/2012 12:24 PM, Cook, Malcolm wrote:
.> Thanks for digging into this, Herve, Michael.
.>
.> Herve, I really appreciate your following up on R-devel, such as you
.> recently did that got mapply 'fixed' to work natively with Bioc's List
.> and friends (c.f. http://developer.r-project.org/blosxom.cgi/R-devel/NEWS)
.>
.> I don't think re-defining c as a generic in BioConductor is a good
.> workaround, for the reasons you mentioned Herve. The issue will just
.> crop up again with someone else's non BioC S4 class structure.
.>
.> It really is not a BioConductor issue at all.
.>
.> If this can also be kicked upstream, that would serve others as well.
.
.Glups, I wrote a long answer about the pros and cons of putting stuff
.in BiocGenerics vs trying to push it into mainstream R. I was about to
.press the Send button but, before doing so, decided to have a quick
.look at the source of the methods package (following Michael suggestion)
.just to confirm my feeling that this would be a tough one, so tough
.that my previous workaround would suddenly sound much more appealing.
.I was psychologically and emotionally prepared to have a rough time,
.but, surprisingly, I didn't. Here is the patch:
.
.hpages at thinkpad:~/biocprojects/c_implicit_generic/R-devel$ svn diff
.Index: src/library/methods/R/BasicFunsList.R
.===================================================================
.--- src/library/methods/R/BasicFunsList.R (revision 61310)
.+++ src/library/methods/R/BasicFunsList.R (working copy)
.@@ -46,7 +46,7 @@
. , "%*%" = function(x, y) standardGeneric("%*%")
. , "xtfrm" = function(x) standardGeneric("xtfrm")
. ### these have a different arglist from the primitives
.-, "c" = function(x, ..., recursive = FALSE) standardGeneric("c")
.+, "c" = function(..., recursive = FALSE) standardGeneric("c")
. , "all" = function(x, ..., na.rm = FALSE) standardGeneric("all")
. , "any" = function(x, ..., na.rm = FALSE) standardGeneric("any")
. , "sum" = function(x, ..., na.rm = FALSE) standardGeneric("sum")
.
.Yes, a 1-liner! I did very little testing but it seems to work fine :-)
.
.I'll do more testing before I send this to R-devel. Thanks for the
.encouragements.
.
.H.
.
.>
.> Thoughts?
.>
.> ~Malcolm
.>
.> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com]
.> *Sent:* Thursday, December 13, 2012 11:13 AM
.> *To:* Hervé Pagès
.> *Cc:* Cook, Malcolm; Michael Lawrence; bioconductor at r-project.org
.> *Subject:* Re: [BioC] IRanges/List oddity: do.call of `c` on a list of
.> IRangesList returns "list" only when the list is named
.>
.> Probably better to bring this issue to the attention of John Chambers.
.> Since he's invited us to start hacking on the methods package, this
.> might be a good opportunity smooth out some of these rough edges.
.>
.>
.> Michael
.>
.> On Wed, Dec 12, 2012 at 6:46 PM, Hervé Pagès <hpages at fhcrc.org
.> <mailto:hpages at fhcrc.org>> wrote:
.>
.> Hi Malcolm,
.>
.> I'm not sure what the reasons are for the current behaviour
.> of the c() generic, if they're just historical, or if there
.> is something deeper, or...
.>
.> My view on the "primitive" status of a function is that it should
.> be an implementation detail, maybe an important one, but a
.> detail anyway in the sense that being implemented as a .Primitive
.> or an .Internal or just in plain R should not affect the semantic
.> of a function. Interestingly there is a short comment in ?.Primitive
.> suggesting that people's code should not depend on knowing which
.> functions are primitive because this does change as R evolves.
.> Unfortunately the reality is very different: there are situations
.> where you definitely need to know that something is a primitive,
.> just because argument passing (and consequently method dispatch)
.> works differently.
.>
.> On a more positive note, I found a hack that allows c() to dispatch
.> on ...:
.>
.> setGeneric("c", signature="...",
.> function(..., recursive=FALSE)
.> standardGeneric("c"),
.> useAsDefault=function(..., recursive=FALSE)
.> base::c(..., recursive=recursive)
.> )
.>
.> Then:
.>
.> setClass("A", representation(aa="integer"))
.>
.> setMethod("c", "A",
.> function(..., recursive=FALSE)
.> {
.> args <- list(...)
.> ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE)
.> new("A", aa=ans_aa)
.> }
.> )
.>
.> > a1 <- new("A", aa=1:3)
.> > a2 <- new("A", aa=22:25)
.>
.> > c(a1, a2)
.> An object of class "A"
.> Slot "aa":
.> [1] 1 2 3 22 23 24 25
.>
.> > c(a1, x=a2)
.> An object of class "A"
.> Slot "aa":
.> [1] 1 2 3 22 23 24 25
.>
.> > c(A=a1, B=a2)
.> An object of class "A"
.> Slot "aa":
.> [1] 1 2 3 22 23 24 25
.>
.> Overriding base::c() with our own c() is pretty invasive though and
.> I didn't test it enough to guarantee that it doesn't break or slowdown
.> things.
.>
.> Also one important thing to note is that this signature doesn't
.> allow specific methods to implement extra arguments (like the "c"
.> method for GenomicRanges does), which kind of makes sense because
.> the generic function is putting named args that are not named
.> 'recursive' in ..., and dispatches on them. The same restriction
.> applies to the cbind() and rbind() generics:
.>
.> > setMethod("cbind", "A", function(..., deparse.level=1,
.> my.toggle=FALSE) NULL)
.> Creating a generic function for 'cbind' from package 'base' in the
.> global environment
.> in method for 'cbind' with signature '"A"': no definition for class "A"
.> Error in rematchDefinition(definition, fdef, mnames, fnames, signature) :
.> arguments (deparse.level) after '...' in the generic must appear in
.> the method, in the same place at the end of the argument list
.>
.> So some of the "c" methods would need to be revisited.
.>
.> Anyway, would need serious testing before adding this generic to
.> BiocGenerics. Is it worth it?
.>
.> Cheers,
.> H.
.>
.>
.>
.>
.> On 12/03/2012 12:11 PM, Cook, Malcolm wrote:
.>
.> Steve, Michael, Herve, all
.>
.> As always, "illuminating".
.>
.> And, as often, frustrating.
.>
.> I am clear how unname serves as a workaround for my current purpose.
.> So, I can proceed.
.>
.> But, I remain unclear if this (to me, odd) behavior of `base::c` is
.> desirable or justifiable in any sense of the word. Is this informed by
.> a rational language design, or, as Mike suggests, the result of layering
.> on of OO design onto a functional base.
.>
.> In your opinion, do you/we think this issue should this issue be raised
.> on R-devel? Or is it a "waste of time"?
.>
.> Thanks for your thoughts/help.
.>
.> ~Malcolm
.>
.> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com
.> <mailto:lawrence.michael at gene.com>]
.> *Sent:* Monday, December 03, 2012 11:31 AM
.> *To:* Hervé Pagès
.> *Cc:* Cook, Malcolm; bioconductor at r-project.org
.> <mailto:bioconductor at r-project.org>
.> *Subject:* Re: [BioC] IRanges/List oddity: do.call of `c` on a list of
.>
.>
.> IRangesList returns "list" only when the list is named
.>
.> On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages at fhcrc.org
.> <mailto:hpages at fhcrc.org>
.>
.> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
.>
.> Hi Malcolm,
.>
.> The problem you are describing can be reproduced by calling c()
.> directly on S4 objects.
.>
.> * With unnamed arguments:
.>
.> > c(IRanges(), IRanges())
.> IRanges of length 0
.>
.> > c(Rle(), Rle())
.> logical-Rle of length 0 with 0 runs
.> Lengths:
.> Values :
.>
.> * With named arguments:
.>
.> > c(a=IRanges(),b=IRanges())
.> $a
.> IRanges of length 0
.>
.> $b
.> IRanges of length 0
.>
.> > c(a=Rle(), b=Rle())
.> $a
.> logical-Rle of length 0 with 0 runs
.> Lengths:
.> Values :
.>
.> $b
.> logical-Rle of length 0 with 0 runs
.> Lengths:
.> Values :
.>
.> This statement (found in man page for base::c()) is showing what the
.> root of the problem is:
.>
.> S4 methods:
.>
.> This function is S4 generic, but with argument list '(x, ...,
.> recursive = FALSE)'.
.>
.> Note that, to make things a little bit more confusing, it's not totally
.> accurate that c() is an S4 generic, at least not on a fresh session:
.>
.> > isGeneric("c")
.> [1] FALSE
.>
.> So my understanding of the above statement is that c() will
.> automatically be turned into an S4 generic at the moment you try
.> to define an S4 method for it, and, for obscure reasons that I'm not
.> sure I understand, the argument list used in the definition of this
.> S4 method must start with 'x'. The consequence of all this is that
.> dispatch will happen on 'x' so if named arguments are passed with
.> a name that is not 'x', dispatch will fail and the default method
.> (which is base::c()) will be called :-b
.>
.> This explains why things work as expected in the following situations:
.>
.> > c(IRanges(), b=IRanges())
.> IRanges of length 0
.>
.> > c(a=IRanges(), IRanges())
.> IRanges of length 0
.>
.> > c(a=IRanges(), x=IRanges())
.> IRanges of length 0
.>
.> But when all the arguments are named with names != 'x', then nothing
.> is passed to 'x' and dispatch fails.
.>
.> I didn't have much luck so far with my attempts to work around this:
.>
.> 1. Trying to change the signature of the c() generic:
.>
.> > setGeneric("c", signature="...")
.> Error in setGeneric("c", signature = "...") :
.> 'c' is a primitive function; methods can be defined, but
.> the generic function is implicit, and cannot be changed.
.>
.> 2. Trying to dispatch on "missing" or "ANY":
.>
.> > setMethod("c", "missing", function(x, ..., recursive=FALSE)
.> "YES!")
.> Error in setMethod("c", "missing", function(x, ..., recursive =
.> FALSE) "YES!") :
.> the method for function 'c' and signature x="missing" is sealed
.> and cannot be re-defined
.>
.> > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!")
.> Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE)
.> "YES!") :
.> the method for function 'c' and signature x="ANY" is sealed and
.> cannot be re-defined
.>
.> With old versions of R dispatch on ... was not possible i.e. ... was not
.> allowed to be in the signature of the generic. This was changed in
.> recent versions of R and we're already using this new feature for a
.> few S4 generics defined in BiocGenerics e.g. for cbind() and rbind():
.>
.> > library(BiocGenerics)
.> > rbind
.> standardGeneric for "rbind" defined from package "BiocGenerics"
.>
.> function (..., deparse.level = 1)
.> standardGeneric("rbind")
.> <environment: 0x29b96b0>
.> Methods may be defined for arguments: ...
.> Use showMethods("rbind") for currently available ones.
.>
.> And dispatch works as expected, with or without named arguments:
.>
.> > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23))
.> DataFrame with 6 rows and 2 columns
.> X Y
.> <integer> <integer>
.> 1 1 11
.> 2 2 12
.> 3 3 13
.> 4 1 21
.> 5 2 22
.> 6 3 23
.>
.> > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23))
.> DataFrame with 6 rows and 2 columns
.> X Y
.> <integer> <integer>
.> 1 1 11
.> 2 2 12
.> 3 3 13
.> 4 1 21
.> 5 2 22
.> 6 3 23
.>
.> So I wonder if the weird behavior of c() is still justified.
.>
.> Comments/suggestions to address this are welcome.
.>
.>
.>
.> The issue is that (unlike 'rbind') 'c' is a primitive and dispatch for
.> primitives is hard-coded in C. C-level dispatch is a simplified variant
.> of the R implementation, so I'm guessing it does not work with "...".
.>
.> Btw, you can get a peak at the 'c' generic with:
.> > getGeneric("c")
.> standardGeneric for "c" defined from package "base"
.>
.> function (x, ..., recursive = FALSE)
.> standardGeneric("c", .Primitive("c"))
.> <bytecode: 0x382af20>
.> <environment: 0x34d6878>
.> Methods may be defined for arguments: x, recursive
.> Use showMethods("c") for currently available ones.
.>
.> Michael
.>
.> Thanks,
.> H.
.>
.>
.>
.>
.> On 11/30/2012 11:56 AM, Cook, Malcolm wrote:
.>
.> Hi,
.>
.> The following shows that do.call of `c` on a list of IRangesList
.> returns "list" only when the list is named.
.>
.> library(IRanges)
.> example(IRangesList)
.> class(x)
.>
.> [1] "CompressedIRangesList"
.> attr(,"package")
.> [1] "IRanges"
.>
.> class(do.call(c,list(x1=x,x2=x)))
.>
.> [1] "list"
.>
.> I am confused this.
.>
.> I would not expect the fact that the list is named to have any
.> impact on the result.
.>
.> But, look, omitting the list names the class is now an IRangesList
.>
.> class(do.call(c,list(x,x)))
.>
.> [1] "CompressedIRangesList"
.> attr(,"package")
.> [1] "IRanges"
.>
.> class(c(x,x))
.>
.> [1] "CompressedIRangesList"
.> attr(,"package")
.> [1] "IRanges"
.>
.> A 'workaround' is to unname the list, as demonstrated:
.>
.> class(do.call(c,unname(list(x1=x,x2=x))))
.>
.> [1] "CompressedIRangesList"
.> attr(,"package")
.> [1] "IRanges"
.>
.> But, why does having a 'names' attribute effect the behavior of
.> do.calling `c` so much as to change the class returned?
.>
.>
.> Thanks for your help/education.....
.>
.> Malcolm Cook
.> Computational Biology - Stowers Institute for Medical Research
.>
.> sessionInfo()
.>
.> R version 2.15.1 (2012-06-22)
.> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
.>
.> locale:
.> [1] C
.>
.> attached base packages:
.> [1] stats graphics grDevices utils datasets methods
.> base
.>
.> other attached packages:
.> [1] IRanges_1.16.4 BiocGenerics_0.4.0
.>
.> loaded via a namespace (and not attached):
.> [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0
.> Biostrings_2.26.2 DBI_0.2-5
.> GenomicFeatures_1.10.1 GenomicRanges_1.10.5 RCurl_1.95-3
.> RSQLite_0.11.2 Rsamtools_1.10.2 XML_3.95-0.1
.> biomaRt_2.14.0 bitops_1.0-4.2 colorspace_1.2-0
.> data.table_1.8.6 functional_0.1 graph_1.36.1
.> gtools_2.7.0 parallel_2.15.1
.> rtracklayer_1.18.1 stats4_2.15.1 tools_2.15.1
.> zlibbioc_1.4.0
.>
.>
.> _______________________________________________
.> Bioconductor mailing list
.>
.> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
.> <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
.>
.>
.> https://stat.ethz.ch/mailman/listinfo/bioconductor
.> Search the archives:
.> http://news.gmane.org/gmane.science.biology.informatics.conductor
.>
.> --
.> Hervé Pagès
.>
.> Program in Computational Biology
.> Division of Public Health Sciences
.> Fred Hutchinson Cancer Research Center
.> 1100 Fairview Ave. N, M1-B514
.> P.O. Box 19024
.> Seattle, WA 98109-1024
.>
.> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
.> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
.> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
.> <tel:%28206%29%20667-5791>
.> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
.> <tel:%28206%29%20667-1319>
.>
.>
.>
.> _______________________________________________
.> Bioconductor mailing list
.> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
.> <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
.>
.>
.> https://stat.ethz.ch/mailman/listinfo/bioconductor
.> Search the archives:
.> http://news.gmane.org/gmane.science.biology.informatics.conductor
.>
.>
.> --
.> Hervé Pagès
.>
.> Program in Computational Biology
.> Division of Public Health Sciences
.> Fred Hutchinson Cancer Research Center
.> 1100 Fairview Ave. N, M1-B514
.> P.O. Box 19024
.> Seattle, WA 98109-1024
.>
.> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
.> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
.> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
.>
.
.--
.Hervé Pagès
.
.Program in Computational Biology
.Division of Public Health Sciences
.Fred Hutchinson Cancer Research Center
.1100 Fairview Ave. N, M1-B514
.P.O. Box 19024
.Seattle, WA 98109-1024
.
.E-mail: hpages at fhcrc.org
.Phone: (206) 667-5791
.Fax: (206) 667-1319
More information about the Bioconductor
mailing list