[Bioc-devel] eSet classes

Vincent Carey 525-2265 stvjc at channing.harvard.edu
Wed Jun 27 18:13:06 CEST 2007


> Hi Pierre,
>
> I guess your question concerns development and maintenance of your new
> code/package.  I would suggest to use as basic data types (e.g.
> vectors, matrices, lists) as possible for the inner/core
> implementation of your algorithm(s) and then provide wrapper functions
> for more complex structures (e.g. eSet).  The basic data types in R
> have been around from day one and will remain until the very end,
> whereas other structures (classes) tend to come and go.
>
> This will make it easier for you to maintain and update you algorithm
> and add support for new structures when they get around.  It will also
> make it easier for others to add support for their structures (without
> necessarily having to wrap them up in more complex structure first).
> It will also make it easier for others to port your algorithm to other
> languages or implement a native-code implementation.
>
> So, keep it simple and add on top of that.

these remarks are pretty unobjectionable but i will issue one
rebuttal.  it is almost completely inevitable that software
design/development starts with basic data types.

however, the classes are there to motivate/support designs that will work
well in practice related to bioinformatics/compbio.  one of the
central concerns tackled by the eSet design is that it is
generally inadequate to design for subsetting the matrix of
assay results on its own.   it is important to consider how such
subset operations induce selections in the associated data frame of
sample level data.

if you support selections of samples in the assay data, the
same selections in the sample data are probably desired.
designing with eSet functionality in mind can reduce the effort
required to accomplish sensible subsetting.  the fact that the
eSet class is closed under subsetting operations (subset an eSet,
you get back an eSet) is also beneficial.  matrices don't have
these characteristics.

so while it is true that you could 'wrap' low level software/data to
get the behaviors of the classes, that level of separation may not
be most effective.  if you are concerned that the next generation
of containers may be incompatible with eSet and friends, note that
we put a fair amount of effort into establishing converters from
exprSet to eSet and that a lot of software and serialized data objects
converted with little effort.



More information about the Bioc-devel mailing list