[BioC] Missing Values after cyclic loess in limma
Wolfgang Huber
whuber at embl.de
Sun Nov 4 10:39:37 CET 2012
Hi Gordon
thanks for the many good points.
Literature: I don't have an overview over uses or citations of vsn, and benchmarking normalisation methods is as we know a complex topic, but below I paste some recent references where vsn was used for multiple (dozens - hundred) single colour arrays.
Affine linear: vsn transforms each array's data x with the transformation glog(a*x+b) with array specific parameters a and b, and an overall (same for all arrays) function glog(y)=log2( (y+sqrt(y^2+1) /2). The array specific part is affine linear.
Cyclic loess is an iterative algorithm (as is vsn), and its implementation in limma by default stops after 3 iterations regardless of whether convergence was reached. While I concur that this is numerically robust, isn't the lack of data-dependent convergence diagnostic a reason to worry?
I also have a question to you: there is nothing intrinsically bivariate (1D regressor, 1D response) about local regression, multivariate approaches have been proposed (e.g. Keppler, Crosby, Morgan in Genome Biology 2002), and good implementations exist in R (e.g. locfit package), why is it so popular to do this pair-wise (with the obvious drawback of n^2 complexity)?
Best wishes
Wolfgang
Zhenyu Xu*, Wu Wei*, Julien Gagneur, Fabiana Perocchi, Sandra Clauder-Muenster, Jurgi Camblong, Elisa Guffanti, Francoise Stutz, Wolfgang Huber, and Lars M. Steinmetz. Bidirectional promoters generate pervasive transcription in yeast. Nature, 457(7232):1033-1037, 2009.
Zhenyu Xu*, Wu Wei*, Julien Gagneur*, Sandra Clauder-Münster, Miosz Smolik, Wolfgang Huber, and Lars M. Steinmetz. Antisense expression increases gene expression variability and locus interdependency. Molecular Systems Biology, 7, 2011.
E. Benito, L. M. Valor, M. Jimenez-Minchan, W. Huber, and A. Barco. cAMP response element-binding protein is a primary hub of activity-driven neuronal gene expression. Journal of Neuroscience, 31:18237-18250, 2011.
Ramona Schmid, Patrick Baum, Carina Ittrich, Katrin Fundel-Clemens, Wolfgang Huber, Benedikt Brors, Roland Eils, Andreas Weith, Detlev Mennerich, and Karsten Quast. Comparison of normalization methods for Illumina BeadChip(R) HumanHT-12 v3. BMC Genomics, 11:349, 2010.
Il giorno Nov 4, 2012, alle ore 1:05 AM, Gordon K Smyth <smyth at wehi.EDU.AU> ha scritto:
> The problem had nothing to do with the loess function.
>
> I do not know of any objective grounds by which one could claim vsn to be more numerically robust than loess. The former requires iterative parameter estimation whereas loess is a closed-form calculation requiring nothing more complex than linear regression.
>
> The literature does indeed suggest that cyclic loess would an obvious choice in high DE situations, which is the context here. There is no literature than I know of supporting vsn in this context.
>
> Affine functions are linear transformations with an intercept. Vsn is not a linear transformation while, ironically, the local polynomials used by loess are.
>
> Gordon
>
>> Date: Fri, 2 Nov 2012 19:45:45 +0100
>> From: Wolfgang Huber <whuber at embl.de>
>> To: "Claus Mayer [guest]" <guest at bioconductor.org>
>> Cc: bioconductor at r-project.org
>> Subject: Re: [BioC] Missing Values after cyclic loess in limma
>>
>> Hi Claus
>>
>> if there is a chance that affine functions might already do a good enough job for you, compared to loess' local polynomials, then "vsn" might be an option for you, which is intended to be more numerically robust.
>>
>> Best wishes
>> Wolfgang
>>
>> Il giorno Nov 2, 2012, alle ore 6:45 PM, "Claus Mayer [guest]" <guest at bioconductor.org> ha scritto:
>>
>>>
>>> Hello,
>>>
>>> I am just working on my first ever single channel Agilent array data set. Because I do expect large changes in differential expression I wanted to use the cyclic loess normalisation within limma rather than quantile normalisation. I used the default settings i.e.
>>>
>>> y<-normalizeBetweenArrays(x,method="cyclicloess")
>>>
>>> where x is the ELlistRaw object. As expected this took a while but to my surprise produced hundreds of missing values for each array as indicated by the message
>>>
>>> Warning message:
>>> In log2(Recall(object$E, method = method, ...)) : NaNs produced
>>>
>>> I checked the raw values which are all well above 0 and include no NAs. I also did not use any background correction, so I don't quite understand why logging should produce any missing values. I had assumed that the method would first log and then apply the cyclic loess algorithm, which in itself shouldn't produce any NAs either. Have I misunderstood something basic here?
>>>
>>> Thanks,
>>>
>>> Claus
>>>
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>> R version 2.13.0 (2011-04-13)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> other attached packages:
>>> [1] limma_3.8.2
>>>
>>> --
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}
More information about the Bioconductor
mailing list