[Bioc-devel] Overloading subset operator for an S4 object with more than two dimensions

Wolfgang Huber whuber at embl.de
Thu May 14 13:18:42 CEST 2015


Dear Christian

not sure this is a wise idea, it breaks the semantics of “[“.
The number of elements stored in an array is the product of the extent of its dimensions.
In your example, it is the sum.
To put it less abstract, a[1:2, 2, 3:4, 1] for a regular array is a 2 x 2 matrix, whereas in your construct is something with 2 + 1 + 2 + 1 = 6 numbers in it. 

As you say, it looks like you want something like the semantics of ‘subset’ (base package) or `filter` (dplyr), and then using such method names would be more intuitive.


Wolfgang





> On May 14, 2015, at 12:35 GMT+2, Christian Arnold <christian.arnold at embl.de> wrote:
> 
> Hi there,
> 
> I am about to develop a Bioconductor package that implements a custom S4 object, and I am currently thinking about a few issues, including the following:
> 
> Say we have an S4 object that stores a lot of information in different slots. Assume that it does make sense to extract information out of this object in four different "dimensions" (conceptually similar to a four-dimensional object), so one would like to use the subset "[" operator for this, but extending beyond the "typical" one or two dimensions to 4:
> 
> setClass("A", representation=representation(a="numeric",b="numeric",c="numeric",d="numeric"))
> a = new("A", a=1:5,b=1:5,c=1:5,d=1:5)
> 
> Now it would be nice to do stuff like a[1,2,3:4,5], which should simply return the selected elements in slots a, b, c, and d, respectively. So a[1,2,3:4,5] would return:
> 
> An object of class "A"
> Slot "a":
> [1] 1
> 
> Slot "b":
> [1] 2
> 
> Slot "c":
> [1] 3 4
> 
> Slot "d":
> [1] 5
> 
> This is how far I've come:
> 
> setMethod("[", c("A", "ANY", "ANY","ANY"),
>          function(x, i, j, ..., drop=TRUE)
>          {
>            dots <- list(...)
>            if (length(dots) > 2) {
>              stop("Too many arguments, must be four dimensional")
>            }
> 
>            # Parse the extra two dimensions that we need from the ... argument
>            k = ifelse(length(dots) > 0 , dots[[1]], c(1:5))
>            l = ifelse(length(dots) == 2, dots[[2]], c(1:5))
> 
>            initialize(x, a=x at a[i],b=x at b[j],c=x at c[k],d=x at d[l])
>          })
> 
> This works for stuff like a[1,2,3, 4], but fails with a general error if one of the indices is a vector such as a[1:2,2,3, 4] or a[1,2,3,4:5].
> 
> 
> So, in summary, my questions are:
> 1. Is there a reasonable way of achieving the 4-dimensional subsetting that works as a user would expect it to work?
> 2. Does it make more sense to write a custom function instead to achieve this, such as subsetObject() without overloading "[" explicitly? What are the Bioconductor recommendations here?
> 
> I'd appreciate any help, suggestions, etc!
> 
> Thanks,
> Christian
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list