[BioC] EDASeq within normalization

davide risso risso.davide at gmail.com
Wed Oct 16 02:45:07 CEST 2013


Hi Catarina,

our within-sample normalization is meant to normalize for one factor
at the time.
In our paper (http://www.biomedcentral.com/1471-2105/12/480/) we
showed that in our data GC-content effect are possibly
library-specific and can bias differential expression, while we didn't
see such a library-specific effect for gene length. Hence, we propose
to normalize for GC-content and not for length.

If you want to normalize for both GC-content and length, I suggest to
have a look at the cqn normalization
(http://bioconductor.org/packages/release/bioc/html/cqn.html) that, if
I remember correctly, accounts for both effects.

I also suggest to carefully "look" at the data, e.g. with the EDASeq
functions biasPlot and biasBoxplot to see if you need to normalize for
GC-content and/or length effects, because this may vary a lot across
datasets.

Best regards,
Davide

On Thu, Oct 10, 2013 at 11:05 AM, Catarina Almeida
<catarina.fa at gmail.com> wrote:
> Dear all,
>
> I'm using EDASeq to normalize my RNA-seq data.
>
> But I'm having some trouble understanding how to normalize for gc and for
> length... I got the idea that I needed to do it separately, like this:
>
> # within and between lane normalization for GC #
> dataWithinGC2 <- withinLaneNormalization(data,"gc",which="full")
> dataNormGC2 <- betweenLaneNormalization(dataWithinGC,which="full")
>
> # within and between lane normalization for length ##
> dataWithinLength <- withinLaneNormalization(data,"length",which="full")
> dataNormLength <- betweenLaneNormalization(dataWithinLength,which="full")
>
> Am I thinking right? Or should I within-normalize my data for both GC and
> length, like this:
> dataWithin <- withinLaneNormalization(data,"length",which="full")
> dataWithin <- withinLaneNormalization(dataWithin,"gc",which="full")
> dataNorm   <- betweenLaneNormalization(dataWithin,which="full")
>
> Any help is much appreciated!
> C
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



-- 
Davide Risso, PhD
Post Doctoral Scholar
Department of Statistics
University of California, Berkeley
344 Li Ka Shing Center, #3370
Berkeley, CA 94720-3370
E-mail: davide.risso at berkeley.edu



More information about the Bioconductor mailing list