[Bioc-devel] Request to add 'normalize' to BiocGenerics

Hervé Pagès hpages at fhcrc.org
Tue Feb 19 23:44:01 CET 2013


Hi Laurent, and maintainers of packages with a normalize() function,

On 02/15/2013 04:28 AM, Laurent Gatto wrote:
> A quick (and incomplete) manual search using
> http://search.bioconductor.jp/ suggest the following usage of
> normalize:
>
> As a function:
> xps::normalize
> codelink::normalize
> EBImage::normalize
> diffGeneAnalysis::normalize
>
> Defining a generic and methods:
> oligo::normalize
> flowCore::normalize
> MSnbase::normalize
> isobar::normalize
>
> and
>
> several normalize\.[*+] functions
>
> Would it be reasonable to add a normalize generic definition to
> BiocGenerics?  The generic definitions in the above packages differ,
> however.

Sounds good to me.

However, since the various normalize() functions have different
signatures, we need to agree on what the signature of the generic
in BiocGenerics should be.

Here is a summary of the situation:

   ** xps package: normalize() is an ordinary function with the
      following arg list:

        normalize(xps.data, filename=character(0), filedir=getwd(),
                  tmpdir="", update=FALSE, select="all", method="mean",
                  option="transcript:all", logbase="0", exonlevel="",
                  refindex=0, refmethod="mean", params=list(0.02, 0),
                  add.data=TRUE, verbose=TRUE)

      The package also defines normalize.constant(), normalize.lowess(),
      normalize.quantiles(), normalize.supsmu(), which are also ordinary
      functions (not S3 methods), and have similar but slightly
      different arg lists.

   ** codelink package: Ordinary function with the following args:

        normalize(object, method="quantiles", log.it=TRUE,
                  preserve=FALSE, weights=NULL, verbose=FALSE)

   ** EBImage package: Ordinary function with the following args:

        normalize(x, separate=TRUE, ft=c(0, 1))

   ** diffGeneAnalysis package: Ordinary function with the following
      args:

        normalize(rawdata, numSlides, ctrl, expm, ctrlbg=0.30,
                  expmbg=0.30)

   ** deepSNV package: S4 generic with the following args:

        normalize(test, control, ...)

   ** isobar package: S4 generic with the following args:

        normalize(x, f=median, target="intensity", exclude.protein=NULL,
                     use.protein=NULL, f.doapply=TRUE, log=TRUE,
                     channels=NULL, na.rm=FALSE, per.file=TRUE, ...)

   ** affy package: S4 generic with the following args:

        normalize(object, ...)

   ** flowCore package: S4 generic with the following args:

        normalize(data, x, ...)

   ** MSnbase package: S4 generic with the following args:

        normalize(object, method, ...)

   ** oligo package: S4 generic with the following args:

        normalize(object, method=normalizationMethods(),
                  copy=TRUE, subset=NULL,
                  target='core', verbose=TRUE, ...)

So it looks like the greatest common factor is normalize(x, ...).
Not too surprising for a generic that covers such a wide range of
related but slightly different concepts / algorithms.

One technical difficulty though is that, even though almost all these
functions seem to take an S4 object as their 1st arg, some of them
don't:

   (a) For EBImage::normalize(), 'x' can be an ordinary array in
       addition to an Image object.

   (b) For diffGeneAnalysis::normalize(), 'rawdata' is an ordinary
       matrix.

   (c) For deepSNV::normalize(), 'test' can be an ordinary matrix
       in addition to a deepSNV object.

   (d) For oligo::normalize(), 'object' can be an ordinary matrix
       in addition to a FeatureSet object.

So how can we disambiguate when the first arg is an ordinary matrix?
IMO normalize() should fail in that case i.e. no package should define
methods for ordinary arrays or matrices. Not ideal but better than the
current situation where what is returned depends on which package was
loaded last.

I could put normalize(x, ...) in BiocGenerics if nobody objects, but
that's all. I don't have time to fix the 10 packages that this change
will affect. However, I'd rather wait the beginning of the Bioc 2.13
devel cycle (April) for such a change. For some packages like
diffGeneAnalysis (which doesn't use S4 at all), that will probably
require a significant amount of changes since it will need to pass
the data to normalize in an S4 container instead of an ordinary matrix.

Comments and suggestions are welcome.

Thanks,
H.

>
> Best wishes,
>
> Laurent
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list