[Bioc-devel] rbind for ExpressionSet objects?

Robert Gentleman rgentlem at fhcrc.org
Mon May 5 18:32:55 CEST 2008


The idea is for rbind2 and cbind2 to be able to deal with multiple 
arguments by combining them pairwise, where there is some chance of 
dealing with the ambiguity.  There are some notes in the man page, and 
use of these in a non-trivial way requires buying into a number of other 
things that one may or may not want to do (and some care may be needed 
to turn this on and off in appropriate situations).

ETH-Martin, while I understand your desire to make rbind2 etc some sort 
of standard for S4, as FHCRC-Martin said, that is not the operation that 
is being performed (at least not very often).  We typically have much 
more complex arrangments (and in a sense this would be more like merge 
and friends, but still different enough that I think we need to keep our 
notion of combine.



Martin Morgan wrote:
> I guess another difference between rbind / cbind / combine and rbind2
> / cbind2 is that the latter only allow for two arguments (perhaps that
> is what the '2' is for?) whereas the former will do their work on any
> number of arguments. Martin
> 
> Martin Maechler <maechler at stat.math.ethz.ch> writes:
> 
>>>>>>> "GS" == Gordon K Smyth <smyth at wehi.edu.au>
>>>>>>>     on Thu, 1 May 2008 10:26:01 +1000 (E. Australia Standard Time) writes:
>>     GS> Hi Martin,
>>     GS> I have only just noticed that the methods package now has generic 
>>     GS> functions rbind2() and cbind2(), which it didn't when the combine() 
>>     GS> function was first created for Biobase.
>>
>>     GS> I think it would be clearer and more elegant to implement rbind2() and 
>>     GS> cbind2() methods for ExpressionSet, and to retire combine() sometime down 
>>     GS> the track (not obviously for the imminent Bioconductor release).  The term 
>>     GS> "combine" is a somewhat overused, e.g., it conflicts with the c() function 
>>     GS> in base.
>>
>>     GS> What do you think?
>>
>> (I'm another 'Martin' but nevertheless .. : )
>>
>> I'm strongly in favor of providing  rbind2() and cbind2()
>> methods, base combine() on these for now, and deprecate
>> combine().
>>
>> rbind2() and cbind2() had been introduced exactly for the
>> purpose of providing rbind() / cbind() - like methods for S4
>> objects.
>>
>> Martin Maechler, ETH Zurich
>>
>>
>>     GS> Cheers
>>     GS> Gordon
>>
>>     GS> On Fri, 4 Apr 2008, Martin Morgan wrote:
>>
>>     >> Thanks for the suggestion and examples.
>>     >> 
>>     >> I implemented this in Biobase 1.99.5. It is slightly different from the 
>>     >> version in the beadarraySNP package, in that the content of overlapping 
>>     >> regions of the exprs arrays have to be identical (beadarraySNP allows NAs in 
>>     >> the second matrix).
>>     >> 
>>     >> The functionality I implemented is consistent with the following tests 
>>     >> (hopefully self-explanatory).
>>     >> 
>>     >> data(sample.ExpressionSet)
>>     >> obj <- sample.ExpressionSet
>>     >> 
>>     >> checkEquals(obj, combine(obj[1:250,], obj[251:500,]))
>>     >> checkEquals(obj, combine(obj[,1:13], obj[,14:26]))
>>     >> ## overlapping
>>     >> checkEquals(obj, combine(obj[1:300,], obj[250:500,]))
>>     >> checkEquals(obj, combine(obj[,1:20], obj[,15:26]))
>>     >> 
>>     >> 
>>     >> The implementation introduces a combine method for matricies, which is 
>>     >> consistent with these tests:
>>     >> 
>>     >> ## dimnames
>>     >> m <- matrix(1:20, nrow=5, dimnames=list(LETTERS[1:5], letters[1:4]))
>>     >> checkEquals(m, combine(m, m))
>>     >> checkEquals(m, combine(m[1:3,], m[4:5,]))
>>     >> checkEquals(m, combine(m[,1:3], m[,4, drop=FALSE]))
>>     >> ## overlap
>>     >> checkEquals(m, combine(m[1:3,], m[3:5,]))
>>     >> checkEquals(m, combine(m[,1:3], m[,3:4]))
>>     >> checkEquals(matrix(c(1:3, NA, NA, 6:8, NA, NA,
>>     >> 11:15, NA, NA, 18, NA, NA),
>>     >> nrow=5,
>>     >> dimnames=list(LETTERS[1:5], letters[1:4])),
>>     >> combine(m[1:3,1:3], m[3:5, 3:4]))
>>     >> ## row reordering
>>     >> checkEquals(m[c(1,3,5,2,4),], combine(m[c(1,3,5),], m[c(2,4),]))
>>     >> ## Exceptions
>>     >> checkException(combine(m, matrix(0, nrow=5, ncol=4)),
>>     >> silent=TRUE)         # types differ
>>     >> checkException(combine(m, matrix(0L, nrow=5, ncol=4)),
>>     >> silent=TRUE)         # attributes differ
>>     >> m1 <- matrix(1:20, nrow=5)
>>     >> checkException(combine(m, m1), silent=TRUE) # dimnames required
>>     >> 
>>     >> Please let me know if you had something else in mind, or if there are 
>>     >> problems with this.
>>     >> 
>>     >> Martin
>>     >> 
>>     >> Laurent Gautier wrote:
>>     >>> That would be useful.
>>     >>> 
>>     >>> I have been in a situation where it would have been useful, and spent some 
>>     >>> time
>>     >>> with combine as well before writing my own ad-hoc solution.
>>     >>> 
>>     >>> 
>>     >>> 
>>     >>> Laurent
>>     >>> 
>>     >>> 
>>     >>> 2008/4/4, Gordon K Smyth <smyth at wehi.edu.au>:
>>     >>>> An rbind() method or an rbind-like function for ExpressionSet objects
>>     >>>> would be useful.  Any plans for such a function?
>>     >>>> 
>>     >>>> At the moment, an ExpressionSet object can be subsetted by rows or
>>     >>>> columns.  Column subsets can be put back together using combine(), but
>>     >>>> there's no way I think to put row subsets back together.
>>     >>>> 
>>     >>>> BTW, the help page for the generic function combine() includes the idea 
>>     >>>> of
>>     >>>> combining by rows, but this concept is not honoured by the combine method
>>     >>>> for the eSet class.
>>     >>>> 
>>     >>>> Cheers
>>     >>>> Gordon
>>     >>>> 
>>     >>>> _______________________________________________
>>     >>>> Bioc-devel at stat.math.ethz.ch mailing list
>>     >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>     >>>> 
>>     >>> 
>>     >>> 
>>     >> 
>>     >> 
>>     >> -- 
>>     >> Martin Morgan
>>     >> Computational Biology / Fred Hutchinson Cancer Research Center
>>     >> 1100 Fairview Ave. N.
>>     >> PO Box 19024 Seattle, WA 98109
>>     >> 
>>     >> Location: Arnold Building M2 B169
>>     >> Phone: (206) 667-2793
>>     >> 
>>
>>     GS> _______________________________________________
>>     GS> Bioc-devel at stat.math.ethz.ch mailing list
>>     GS> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioc-devel mailing list