[Bioc-devel] Overloading subset operator for an S4 object with more than two dimensions
Christian Arnold
christian.arnold at embl.de
Mon May 18 15:06:33 CEST 2015
Thanks for your input, highly appreciated!
I can see that the semantics of "[" are violated, so I agree that
overwriting the "subset" method is probably a better way to go.
Essentially, the object stores several, individual-specific count
matrices from RNA-Seq experiments in an potentially allele(read
group)-specific manner. So the dimensions to subset on are the read
groups, the rows and columns of the matrices, and the individuals itself.
So I guess overloading the subset method with four arguments, each
corresponding to one of the dimensions a subset is suitable for this
kind of object, is the way to go.
Thanks,
Christian
On 14.05.2015 15:57, Michael Lawrence wrote:
> I agree with Wolfgang that the semantics of [ are being violated here.
> It would though help if you could be a little less vague about your
> intent. What is this data structure going to store, how should it behave?
>
> On Thu, May 14, 2015 at 3:35 AM, Christian Arnold
> <christian.arnold at embl.de <mailto:christian.arnold at embl.de>> wrote:
>
> Hi there,
>
> I am about to develop a Bioconductor package that implements a
> custom S4 object, and I am currently thinking about a few issues,
> including the following:
>
> Say we have an S4 object that stores a lot of information in
> different slots. Assume that it does make sense to extract
> information out of this object in four different "dimensions"
> (conceptually similar to a four-dimensional object), so one would
> like to use the subset "[" operator for this, but extending beyond
> the "typical" one or two dimensions to 4:
>
> setClass("A",
> representation=representation(a="numeric",b="numeric",c="numeric",d="numeric"))
> a = new("A", a=1:5,b=1:5,c=1:5,d=1:5)
>
> Now it would be nice to do stuff like a[1,2,3:4,5], which should
> simply return the selected elements in slots a, b, c, and d,
> respectively. So a[1,2,3:4,5] would return:
>
> An object of class "A"
> Slot "a":
> [1] 1
>
> Slot "b":
> [1] 2
>
> Slot "c":
> [1] 3 4
>
> Slot "d":
> [1] 5
>
> This is how far I've come:
>
> setMethod("[", c("A", "ANY", "ANY","ANY"),
> function(x, i, j, ..., drop=TRUE)
> {
> dots <- list(...)
> if (length(dots) > 2) {
> stop("Too many arguments, must be four dimensional")
> }
>
> # Parse the extra two dimensions that we need from the
> ... argument
> k = ifelse(length(dots) > 0 , dots[[1]], c(1:5))
> l = ifelse(length(dots) == 2, dots[[2]], c(1:5))
>
> initialize(x, a=x at a[i],b=x at b[j],c=x at c[k],d=x at d[l])
> })
>
> This works for stuff like a[1,2,3, 4], but fails with a general
> error if one of the indices is a vector such as a[1:2,2,3, 4] or
> a[1,2,3,4:5].
>
>
> So, in summary, my questions are:
> 1. Is there a reasonable way of achieving the 4-dimensional
> subsetting that works as a user would expect it to work?
> 2. Does it make more sense to write a custom function instead to
> achieve this, such as subsetObject() without overloading "["
> explicitly? What are the Bioconductor recommendations here?
>
> I'd appreciate any help, suggestions, etc!
>
> Thanks,
> Christian
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
> list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
--
—————————————————————————
Christian Arnold, PhD
Staff Bioinformatician
SCB Unit - Computational Biology
Joint appointment Genome Biology
Joint appointment European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory (EMBL)
Meyerhofstrasse 1; 69117, Heidelberg, Germany
Email: christian.arnold at embl.de
Phone: +49(0)6221-387-8472
Web: http://www.embl.de/research/units/scb/zaugg/
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list