[Bioc-devel] Overloading subset operator for an S4 object with more than two dimensions

Martin Morgan mtmorgan at fredhutch.org
Mon May 18 15:33:25 CEST 2015


On 05/18/2015 06:06 AM, Christian Arnold wrote:
>
> Thanks for your input, highly appreciated!
>
> I can see that the semantics of "[" are violated, so I agree that
> overwriting the "subset" method  is probably a better way to go.
> Essentially, the object stores several, individual-specific count
> matrices from RNA-Seq experiments in an potentially allele(read
> group)-specific manner. So the dimensions to subset on are the read

Maybe this is a SummarizedExperiment with different assays() ? This would be 
appropriate if each assay had the same regions-of-interest (GRanges or 
GRangesList) x Sample dimensions, so may not be relevant to you.

In Bioc 'devel'

   library(SummarizedExperiment)
   ## allele-specific counts, two alleles
   m1 = matrix(rbinom(1000, 100, .1), 100, dimnames=list(NULL, LETTERS[1:10]))
   m2 = matrix(rbinom(1000, 100, .1), 100, dimnames=list(NULL, LETTERS[1:10]))
   se = SummarizedExperiment(assays=list(a1=m1, a2=m2))
   se[1:5,]                            # regions 1-5, across assays
   assays(se[,c("A", "B")])[["a2"]]    # assay a2 for samples "A", "B"


> groups, the rows and columns of the matrices, and the individuals itself.
>
> So I guess overloading the subset method with four arguments, each
> corresponding to one of the dimensions a subset is suitable for this
> kind of object, is the way to go.
>
> Thanks,
> Christian
>
>
> On 14.05.2015 15:57, Michael Lawrence wrote:
>> I agree with Wolfgang that the semantics of [ are being violated here.
>> It would though help if you could be a little less vague about your
>> intent. What is this data structure going to store, how should it behave?
>>
>> On Thu, May 14, 2015 at 3:35 AM, Christian Arnold
>> <christian.arnold at embl.de <mailto:christian.arnold at embl.de>> wrote:
>>
>>      Hi there,
>>
>>      I am about to develop a Bioconductor package that implements a
>>      custom S4 object, and I am currently thinking about a few issues,
>>      including the following:
>>
>>      Say we have an S4 object that stores a lot of information in
>>      different slots. Assume that it does make sense to extract
>>      information out of this object in four different "dimensions"
>>      (conceptually similar to a four-dimensional object), so one would
>>      like to use the subset "[" operator for this, but extending beyond
>>      the "typical" one or two dimensions to 4:
>>
>>      setClass("A",
>>      representation=representation(a="numeric",b="numeric",c="numeric",d="numeric"))
>>      a = new("A", a=1:5,b=1:5,c=1:5,d=1:5)
>>
>>      Now it would be nice to do stuff like a[1,2,3:4,5], which should
>>      simply return the selected elements in slots a, b, c, and d,
>>      respectively. So a[1,2,3:4,5] would return:
>>
>>      An object of class "A"
>>      Slot "a":
>>      [1] 1
>>
>>      Slot "b":
>>      [1] 2
>>
>>      Slot "c":
>>      [1] 3 4
>>
>>      Slot "d":
>>      [1] 5
>>
>>      This is how far I've come:
>>
>>      setMethod("[", c("A", "ANY", "ANY","ANY"),
>>                function(x, i, j, ..., drop=TRUE)
>>                {
>>                  dots <- list(...)
>>                  if (length(dots) > 2) {
>>                    stop("Too many arguments, must be four dimensional")
>>                  }
>>
>>                  # Parse the extra two dimensions that we need from the
>>      ... argument
>>                  k = ifelse(length(dots) > 0 , dots[[1]], c(1:5))
>>                  l = ifelse(length(dots) == 2, dots[[2]], c(1:5))
>>
>>                  initialize(x, a=x at a[i],b=x at b[j],c=x at c[k],d=x at d[l])
>>                })
>>
>>      This works for stuff like a[1,2,3, 4], but fails with a general
>>      error if one of the indices is a vector such as a[1:2,2,3, 4] or
>>      a[1,2,3,4:5].
>>
>>
>>      So, in summary, my questions are:
>>      1. Is there a reasonable way of achieving the 4-dimensional
>>      subsetting that works as a user would expect it to work?
>>      2. Does it make more sense to write a custom function instead to
>>      achieve this, such as subsetObject() without overloading "["
>>      explicitly? What are the Bioconductor recommendations here?
>>
>>      I'd appreciate any help, suggestions, etc!
>>
>>      Thanks,
>>      Christian
>>
>>      _______________________________________________
>>      Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
>>      list
>>      https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list