[Bioc-devel] Request to add 'normalize' to BiocGenerics

Wolfgang Huber whuber at embl.de
Wed Feb 20 20:35:48 CET 2013


Hi Laurent, LEo

there is a clash between two concepts, which seem difficult & messy to reconcile, and all I wanted to do with my initial post was to spur an explicit discussion and awareness of this, rather than just doing it implicitly. Not yet sure if that is going to be productive.

1. package authors can call their functions whatever they like, there is no ontology police, and that's why R has name spaces.
2. there is a 'global' (across packages) generic function, package authors click in their methods to it, and users of multiple packages have a more uniform and less ::-filled user experience.

The BiocGenerics package tries to push concept 2, not for all of R/CRAN, but at least for Bioconductor. Btw, I think this is in the same spirit as the statement in the package guidelines that developers should make use of appropriate existing classes, e.g. ExpressionSet, AnnotatedDataFrame, GRanges, Rle, DNAStringSet.

	Best wishes
	Wolfgang


Il giorno Feb 20, 2013, alle ore 6:47 PM, Laurent Gautier <lgautier at gmail.com> ha scritto:

> On 2013-02-20 17:32, Schalkwyk, Leonard wrote:
>> 
>> Is this not just an indication that normalize is now a poor choice of a function name?
> 
> If the package authors called the functions "normalize", this means either:
> 1- at least some of the package authors have named a function performing an action that is inappropriately described as "normalize"
> 2- all functions "normalize" do perform an action that can be described with that verb
> 
> Without more details, I'd vote for 2.
> 
> (more below)
> 
>> 
>> LEo
>> 
>> On 20 Feb 2013, at 16:14, Wolfgang Huber wrote:
>> 
>>> Hi
>>> 
>>> is it clear that all these different functions (methods) share similar semantics and enough (conceptually) of their interface?
> 
> Playing the semantic and concept police would come after defining things like ontologies of data processing; I am not sure this should be a priority.
> I'd see working out a minimal common signature that keeps everyone going with a minimal fuss come first.
> 
>>> 
>>> Wouldn't the implication be that preemptively every possible string of characters should already be defined as a generic function in BiocGenerics?
> 
> No. Otherwise this would probably also mean that R's S4 system should in fact define all possible strings as generics, which by extension would also mean that generic functions do not need to be explicitly declared: since all possible generics would be declared, it is more practical to implicitly assume any given function has already generic declared. S4 has notions about implicit generic functions; a starting point is the man page for setGeneric().
> 
> 
> 
>>> 
>>> 	Best wishes
>>> 	Wolfgang
>>> 
>>> Il giorno Feb 20, 2013, alle ore 11:04 AM, Laurent Gatto <lg390 at cam.ac.uk> ha scritto:
>>> 
>>>> On 19 February 2013 22:44, Hervé Pagès <hpages at fhcrc.org> wrote:
>>>>> Hi Laurent, and maintainers of packages with a normalize() function,
>>>>> 
>>>>> 
>>>>> On 02/15/2013 04:28 AM, Laurent Gatto wrote:
>>>>>> A quick (and incomplete) manual search using
>>>>>> http://search.bioconductor.jp/ suggest the following usage of
>>>>>> normalize:
>>>>>> 
>>>>>> As a function:
>>>>>> xps::normalize
>>>>>> codelink::normalize
>>>>>> EBImage::normalize
>>>>>> diffGeneAnalysis::normalize
>>>>>> 
>>>>>> Defining a generic and methods:
>>>>>> oligo::normalize
>>>>>> flowCore::normalize
>>>>>> MSnbase::normalize
>>>>>> isobar::normalize
>>>>>> 
>>>>>> and
>>>>>> 
>>>>>> several normalize\.[*+] functions
>>>>>> 
>>>>>> Would it be reasonable to add a normalize generic definition to
>>>>>> BiocGenerics?  The generic definitions in the above packages differ,
>>>>>> however.
>>>>> 
>>>>> Sounds good to me.
>>>>> 
>>>>> However, since the various normalize() functions have different
>>>>> signatures, we need to agree on what the signature of the generic
>>>>> in BiocGenerics should be.
>>>>> 
>>>>> Here is a summary of the situation:
>>>>> 
>>>>> ** xps package: normalize() is an ordinary function with the
>>>>>    following arg list:
>>>>> 
>>>>>      normalize(xps.data, filename=character(0), filedir=getwd(),
>>>>>                tmpdir="", update=FALSE, select="all", method="mean",
>>>>>                option="transcript:all", logbase="0", exonlevel="",
>>>>>                refindex=0, refmethod="mean", params=list(0.02, 0),
>>>>>                add.data=TRUE, verbose=TRUE)
>>>>> 
>>>>>    The package also defines normalize.constant(), normalize.lowess(),
>>>>>    normalize.quantiles(), normalize.supsmu(), which are also ordinary
>>>>>    functions (not S3 methods), and have similar but slightly
>>>>>    different arg lists.
>>>>> 
>>>>> ** codelink package: Ordinary function with the following args:
>>>>> 
>>>>>      normalize(object, method="quantiles", log.it=TRUE,
>>>>>                preserve=FALSE, weights=NULL, verbose=FALSE)
>>>>> 
>>>>> ** EBImage package: Ordinary function with the following args:
>>>>> 
>>>>>      normalize(x, separate=TRUE, ft=c(0, 1))
>>>>> 
>>>>> ** diffGeneAnalysis package: Ordinary function with the following
>>>>>    args:
>>>>> 
>>>>>      normalize(rawdata, numSlides, ctrl, expm, ctrlbg=0.30,
>>>>>                expmbg=0.30)
>>>>> 
>>>>> ** deepSNV package: S4 generic with the following args:
>>>>> 
>>>>>      normalize(test, control, ...)
>>>>> 
>>>>> ** isobar package: S4 generic with the following args:
>>>>> 
>>>>>      normalize(x, f=median, target="intensity", exclude.protein=NULL,
>>>>>                   use.protein=NULL, f.doapply=TRUE, log=TRUE,
>>>>>                   channels=NULL, na.rm=FALSE, per.file=TRUE, ...)
>>>>> 
>>>>> ** affy package: S4 generic with the following args:
>>>>> 
>>>>>      normalize(object, ...)
>>>>> 
>>>>> ** flowCore package: S4 generic with the following args:
>>>>> 
>>>>>      normalize(data, x, ...)
>>>>> 
>>>>> ** MSnbase package: S4 generic with the following args:
>>>>> 
>>>>>      normalize(object, method, ...)
>>>>> 
>>>>> ** oligo package: S4 generic with the following args:
>>>>> 
>>>>>      normalize(object, method=normalizationMethods(),
>>>>>                copy=TRUE, subset=NULL,
>>>>>                target='core', verbose=TRUE, ...)
>>>>> 
>>>>> So it looks like the greatest common factor is normalize(x, ...).
>>>>> Not too surprising for a generic that covers such a wide range of
>>>>> related but slightly different concepts / algorithms.
>>>>> 
>>>>> One technical difficulty though is that, even though almost all these
>>>>> functions seem to take an S4 object as their 1st arg, some of them
>>>>> don't:
>>>>> 
>>>>> (a) For EBImage::normalize(), 'x' can be an ordinary array in
>>>>>     addition to an Image object.
>>>>> 
>>>>> (b) For diffGeneAnalysis::normalize(), 'rawdata' is an ordinary
>>>>>     matrix.
>>>>> 
>>>>> (c) For deepSNV::normalize(), 'test' can be an ordinary matrix
>>>>>     in addition to a deepSNV object.
>>>>> 
>>>>> (d) For oligo::normalize(), 'object' can be an ordinary matrix
>>>>>     in addition to a FeatureSet object.
>>>>> 
>>>>> So how can we disambiguate when the first arg is an ordinary matrix?
>>>>> IMO normalize() should fail in that case i.e. no package should define
>>>>> methods for ordinary arrays or matrices. Not ideal but better than the
>>>>> current situation where what is returned depends on which package was
>>>>> loaded last.
>>>>> 
>>>>> I could put normalize(x, ...) in BiocGenerics if nobody objects, but
>>>>> that's all. I don't have time to fix the 10 packages that this change
>>>>> will affect. However, I'd rather wait the beginning of the Bioc 2.13
>>>>> devel cycle (April) for such a change. For some packages like
>>>>> diffGeneAnalysis (which doesn't use S4 at all), that will probably
>>>>> require a significant amount of changes since it will need to pass
>>>>> the data to normalize in an S4 container instead of an ordinary matrix.
>>>>> 
>>>>> Comments and suggestions are welcome.
>>>> Fine by me.
>>>> 
>>>> Laurent
>>>> 
>>>>> Thanks,
>>>>> H.
>>>>> 
>>>>>> Best wishes,
>>>>>> 
>>>>>> Laurent
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>> 
>>>>> --
>>>>> Hervé Pagès
>>>>> 
>>>>> Program in Computational Biology
>>>>> Division of Public Health Sciences
>>>>> Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N, M1-B514
>>>>> P.O. Box 19024
>>>>> Seattle, WA 98109-1024
>>>>> 
>>>>> E-mail: hpages at fhcrc.org
>>>>> Phone:  (206) 667-5791
>>>>> Fax:    (206) 667-1319
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> 
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 



More information about the Bioc-devel mailing list