[Bioc-devel] rbind for ExpressionSet objects?

Tue May 6 15:26:17 CEST 2008

2008/5/6 Gordon K Smyth <smyth at wehi.edu.au>:
> Thanks to both Martin's for replies.
>
>  I hadn't realised before that combine() is actually a merge-like function,
> although admitedly more careful reading of the help page would have warned
> me.  The name did confuse me: combine() is unlike c() in the base package
> but instead very similar to merge().
>
>  I really did want genuine rbind() and cbind() functions.  I now see that
> combine() does more than I want, and the possibility of unwanted effects
> gives me less trust in it for my work.
>
>  There is some difference in philosophy here I think.  I think of microarray
> data objects as analogous to matrices, whereas combine() is viewing them as
> analogous to data.frames.  It makes sense to "merge" data.frames, but not
> matrices, because row and column names might not be unique.  I am quite
> happy to entertain microarray objects with repeated row or column names.
> Even if I wasn't, I would find it hard to ensure that sample names are
> unique across different experimental runs, expecially considering that the
> names may be set by data files and software which are not under my control.

There are always several to skin a cat, but the data structure
proposed for microarray
data start being rather handy and save one the trouble of reinventing the wheel
(and I can tell you that I am of the picky kind).
It can probably do a lot of what you need, and take care of the
bookkeeping for you.
For example, the slot featureData can accommodate repeated names in
one of its columns
if you have any need to that.
About not having unique sample names, I can tell you that *are*
implicitly having them:
the position of each column in a matrix is a way to identify your
data. Making whatever
you have unique is only a matter of using a sequence of integers for example.

Hoping this helps,

L.

>  All the best
>  Gordon
>
>
>
>  On Mon, 5 May 2008, Martin Morgan wrote:
>
>
> > Martin Maechler <maechler at stat.math.ethz.ch> writes:
> >
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > "GS" == Gordon K Smyth <smyth at wehi.edu.au>
> > > > > > > >    on Thu, 1 May 2008 10:26:01 +1000 (E. Australia Standard
> Time) writes:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >    GS> Hi Martin,
> > >    GS> I have only just noticed that the methods package now has generic
> > >    GS> functions rbind2() and cbind2(), which it didn't when the
> combine()
> > >    GS> function was first created for Biobase.
> > >
> > >    GS> I think it would be clearer and more elegant to implement
> rbind2() and
> > >    GS> cbind2() methods for ExpressionSet, and to retire combine()
> sometime down
> > >    GS> the track (not obviously for the imminent Bioconductor release).
> The term
> > >    GS> "combine" is a somewhat overused, e.g., it conflicts with the c()
> function
> > >    GS> in base.
> > >
> > >    GS> What do you think?
> > >
> > > (I'm another 'Martin' but nevertheless .. : )
> > >
> > > I'm strongly in favor of providing  rbind2() and cbind2()
> > > methods, base combine() on these for now, and deprecate
> > > combine().
> > >
> >
> > (sorry Gordon for not getting back to you on this, I've actually been
> > mulling it over a bit...)
> >
> > combine does more (or is supposed to, anyway) than rbind or cbind,
> > appending non-overlapping rows and columns simultaneously (and
> > introducing NAs in the implied missing values where features present
> > in only one eSet have to be aligned with samples present only in the
> > other, and vice versa).
> >
> > In mulling this over I realized a bug in the current combine (fixed in
> > devel), and the more-or-less overly restrictive description of
> > combine,eSet,eSet-method.
> >
> > I'm kind of wondering whether Gordon's original question was prompted
> > by a malfunctioning / misleading combine, or by a desire to have a
> > more consistent interface to rbind / cbind?
> >
> >
> > > rbind2() and cbind2() had been introduced exactly for the
> > > purpose of providing rbind() / cbind() - like methods for S4
> > > objects.
> > >
> >
> > I know 'why', but it's a little too bad that, for this design goal,
> > rbind2 is not named, well, rbind.
> >
> > I have implemented rbind2 / cbind2 in my local copy of Biobase, and
> > will likely commit over the next day or so. The code is basically
> >
> > setMethod("cbind2",
> >         signature=signature(x="eSet", y="eSet"),
> >         function(x, y) {
> >           ## check that featureNames the same, sampleNames differ,
> >           ## and then...
> >           combine(x, y)
> >         })
> >
> > Gordon, is this the effect you're looking for?
> >
> > Martin Morgan
> >
> >
> > > Martin Maechler, ETH Zurich
> > >
> > >
> > >    GS> Cheers
> > >    GS> Gordon
> > >
> > >    GS> On Fri, 4 Apr 2008, Martin Morgan wrote:
> > >
> > >   >> Thanks for the suggestion and examples.
> > >   >>
> > >   >> I implemented this in Biobase 1.99.5. It is slightly different from
> the
> > >   >> version in the beadarraySNP package, in that the content of
> overlapping
> > >   >> regions of the exprs arrays have to be identical (beadarraySNP
> allows NAs in
> > >   >> the second matrix).
> > >   >>
> > >   >> The functionality I implemented is consistent with the following
> tests
> > >   >> (hopefully self-explanatory).
> > >   >>
> > >   >> data(sample.ExpressionSet)
> > >   >> obj <- sample.ExpressionSet
> > >   >>
> > >   >> checkEquals(obj, combine(obj[1:250,], obj[251:500,]))
> > >   >> checkEquals(obj, combine(obj[,1:13], obj[,14:26]))
> > >   >> ## overlapping
> > >   >> checkEquals(obj, combine(obj[1:300,], obj[250:500,]))
> > >   >> checkEquals(obj, combine(obj[,1:20], obj[,15:26]))
> > >   >>
> > >   >>
> > >   >> The implementation introduces a combine method for matricies, which
> is
> > >   >> consistent with these tests:
> > >   >>
> > >   >> ## dimnames
> > >   >> m <- matrix(1:20, nrow=5, dimnames=list(LETTERS[1:5],
> letters[1:4]))
> > >   >> checkEquals(m, combine(m, m))
> > >   >> checkEquals(m, combine(m[1:3,], m[4:5,]))
> > >   >> checkEquals(m, combine(m[,1:3], m[,4, drop=FALSE]))
> > >   >> ## overlap
> > >   >> checkEquals(m, combine(m[1:3,], m[3:5,]))
> > >   >> checkEquals(m, combine(m[,1:3], m[,3:4]))
> > >   >> checkEquals(matrix(c(1:3, NA, NA, 6:8, NA, NA,
> > >   >> 11:15, NA, NA, 18, NA, NA),
> > >   >> nrow=5,
> > >   >> dimnames=list(LETTERS[1:5], letters[1:4])),
> > >   >> combine(m[1:3,1:3], m[3:5, 3:4]))
> > >   >> ## row reordering
> > >   >> checkEquals(m[c(1,3,5,2,4),], combine(m[c(1,3,5),], m[c(2,4),]))
> > >   >> ## Exceptions
> > >   >> checkException(combine(m, matrix(0, nrow=5, ncol=4)),
> > >   >> silent=TRUE)         # types differ
> > >   >> checkException(combine(m, matrix(0L, nrow=5, ncol=4)),
> > >   >> silent=TRUE)         # attributes differ
> > >   >> m1 <- matrix(1:20, nrow=5)
> > >   >> checkException(combine(m, m1), silent=TRUE) # dimnames required
> > >   >>
> > >   >> Please let me know if you had something else in mind, or if there
> are
> > >   >> problems with this.
> > >   >>
> > >   >> Martin
> > >   >>
> > >   >> Laurent Gautier wrote:
> > >   >>> That would be useful.
> > >   >>>
> > >   >>> I have been in a situation where it would have been useful, and
> spent some
> > >   >>> time
> > >   >>> with combine as well before writing my own ad-hoc solution.
> > >   >>>
> > >   >>>
> > >   >>>
> > >   >>> Laurent
> > >   >>>
> > >   >>>
> > >   >>> 2008/4/4, Gordon K Smyth <smyth at wehi.edu.au>:
> > >   >>>> An rbind() method or an rbind-like function for ExpressionSet
> objects
> > >   >>>> would be useful.  Any plans for such a function?
> > >   >>>>
> > >   >>>> At the moment, an ExpressionSet object can be subsetted by rows
> or
> > >   >>>> columns.  Column subsets can be put back together using
> combine(), but
> > >   >>>> there's no way I think to put row subsets back together.
> > >   >>>>
> > >   >>>> BTW, the help page for the generic function combine() includes
> the idea
> > >   >>>> of
> > >   >>>> combining by rows, but this concept is not honoured by the
> combine method
> > >   >>>> for the eSet class.
> > >   >>>>
> > >   >>>> Cheers
> > >   >>>> Gordon
> > >   >>>>
> > >   >>>> _______________________________________________
> > >   >>>> Bioc-devel at stat.math.ethz.ch mailing list
> > >   >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >   >>>>
> > >   >>>
> > >   >>>
> > >   >>
> > >   >>
> > >   >> --
> > >   >> Martin Morgan
> > >   >> Computational Biology / Fred Hutchinson Cancer Research Center
> > >   >> 1100 Fairview Ave. N.
> > >   >> PO Box 19024 Seattle, WA 98109
> > >   >>
> > >   >> Location: Arnold Building M2 B169
> > >   >> Phone: (206) 667-2793
> > >   >>
> > >
> > >    GS> _______________________________________________
> > >    GS> Bioc-devel at stat.math.ethz.ch mailing list
> > >    GS> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> >
> > --
> > Martin Morgan
> > Computational Biology / Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N.
> > PO Box 19024 Seattle, WA 98109
> >
> > Location: Arnold Building M2 B169
> > Phone: (206) 667-2793
> >
>
>  _______________________________________________
>  Bioc-devel at stat.math.ethz.ch mailing list
>  https://stat.ethz.ch/mailman/listinfo/bioc-devel
>