[Bioc-devel] medianpolish for Affymetrix genechip probesets

Henrik Bengtsson hb at stat.berkeley.edu
Sun Mar 11 16:09:42 CET 2007

On 3/9/07, Ben Bolstad <bmb at bmbolstad.com> wrote:
> Wolfgang,
> The method you suggest would work, but I am a little hesitant to
> recommend you .Call the c code directly. I'd prefer that you just call
> rma() with the background and normalize arguments set to FALSE, though
> that will add a little additional overhead, it won't do anything more
> that the median polish summarization step. That is essentially the
> approach taken in GCRMA (except that it sets background=FALSE,
> normalize=TRUE in its rma() call).
> However, you do raise a larger issue about the multiple implementations
> in various different packages. Mostly the functionality that is
> duplicated consists of the exact same C code in affy, affyPLM and oligo
> (except sometimes a little outdated), with larger differences in
> BufferedMatrixMethods. I did make a promise in the past that I would fix
> this potential maintenance problem by building a package that unify this
> C code in a single location. This way potential developers who wanted to
> use say, quantile normalization or median polish, but did not want to
> "Depends:" on affy (or similar). But building infrastructure is
> decidedly un-sexy, so it is still on my to do list (though perhaps a
> little higher now).

I fully support your idea of providing one code base for all "basic"
modeling and operators used in BioC.   I think affyPLM is a good step
towards this.

May I suggest taking this even step further and provide a low-level
package for single-step modeling / transformations operating on "as
basic data types as possible"?  This package should be used by
developers only to be incorporated inside other functions.  By keeping
the API to use basic data types only (e.g. vectors, matrices, data
frames, lists), it will also be possible to keep it stable over a very
long time.  Such an API does relies much less on the current designs
(e.g. AffyBatch, eSet etc) and other packages (minimal number of
packages in DESCRIPTION's Depends).  It will therefore provide a
common code base that is more stable over time.

Advantages with a low-level API package for single-step methods:
* More likely to be reused.
* Easier to define/document exactly what is done and what is changed
between versions.
* Easier to provide compare algorithms and verify consistency between versions.
* Bug fixes in one package, not many.

This would apply to more packages than affyPLM.  Several of the
probe-level model (PLM) functions in affyPLM could be written so they
accept matrices only, and at a probeset by probeset basis.  For
example, fitPLM.matrix(X, flavor=c("rlm", "median.polish"), ...) could
accept a matrix of probe signals for a single probeset.  This would be
the API closest to the internal code (i.e. the .Call():s).
Higher-level packages could then build up/add fitPLM.AffyBatch(),
fitPLM.eSet() etc.  These higher level packages would evolve with BioC
as new classes/structures are introduced, but fitPLM.matrix() would
remain the same. [I'm using S3 notation because it is shorter to
write; analogously for S4].

I am aware that the above is somewhat the idea of Biobase, but I would
like to take it one step further and make it "more low-level".  For
instance, methods like rowQ(), rowMedians(), and rowMax() in Biobase
could be lifted out to a even lower-level package.  What I am
suggesting should basically never be used explicitly by the end user,
but will be very handy for developers.



PS. Ben, we've chatted about this offline before, but I would like to
bring it up in public too. DS.

> Best,
> Ben
> On Fri, 2007-03-09 at 14:31 +0000, Wolfgang Huber wrote:
> > Hi Rafa, Ben et al.,
> >
> > what is nowadays the prefered implementation of the "medianpolish"
> > probeset summarization / expression value computation method, and which
> > interface to it do you recommend others to use?
> >
> > I have just been looking through affy, BufferedMatrixMethods, oligo,
> > affyPLM, gcrma and found the choice exciting, stimulating and ... large.
> >
> > The reason for the question: vsn can be used to do bg correction and
> > normalization of Affy genechips together with medianpolish to summarize.
> > In the vsn vignette I have so far been recommending to use
> >
> > expresso(x, bg.correct=FALSE, normalize.method="vsn",
> >           pmcorrect.method="pmonly", summary.method="medianpolish")
> >
> > but this can be very slow and I would like to provide a more efficient
> > wrapper à la rma() without reinventing the medianpolish wheel.
> >
> > So far I gather that
> >
> > 1. I should use, in affy/src/rma2.c:
> > SEXP rma_c_call(SEXP PMmat, SEXP MMmat, SEXP ProbeNamesVec,
> >                  SEXP N_probes, SEXP norm_flag)
> > with norm_flag==0, and
> >
> > 2. that I need to exponentiate (2^x) the vsn-transformed data before
> > sending it that way because the first thing you do there is to take log2.
> >
> > Is this correct? Any pointers or comments are welcome.
> >
> > Best wishes
> >    Wolfgang
> >
> > ------------------------------------------------------------------
> > Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
> >
> > _______________________________________________
> > Bioc-devel at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

More information about the Bioc-devel mailing list