[BioC] EDASeq within normalization

davide risso risso.davide at gmail.com
Wed Oct 23 23:57:20 CEST 2013


Hi Catarina,

the answer to the question "what is the best normalization method?"
really depends on the specific dataset that you are looking at. We
suggest to perform a careful exploratory data analysis; the plots in
the EDASeq package are a good starting point, but you may also want to
look at how each different normalization affect the downstream
analysis, e.g. by looking at the distribution of the differential
expression p-values.

In our yeast data the full-quantile normalization seemed to perform
slightly better, but this might be different in other datasets. So,
I'm afraid that you need to "look" carefully at your data after each
normalization and pick the method that leads to the more satisfying
results, in terms of absence of bias, more uniform distribution of the
p-values, etc.

Best,
davide

On Mon, Oct 21, 2013 at 7:55 AM, Catarina Almeida <catarina.fa at gmail.com> wrote:
> Got it, thanks for clarifying and for the suggestion.
> I do have another question though! Hopefully I can make it clear.
>
> For instance, for within-lane normalization, what parameter from do we chose
> from "upper" "loess" median" and "full" for
>  wich= ""
> when normalizing?
>
> I understand how they work and I understand that "full" seems a much more
> accurate way to normalize. What I fail to understand is the criteria used to
> chose between full, upper, median and loess.
> Does it depend on my experience? Is it a question of what method gives the
> best normalized plots?
>
> I've read your article and two tutorials I found on normalizing data (and
> also Bullard's 2010, on the between-lane normalization approaches) but I'm
> afraid I am still confused with this.
>
> Thanks in advance!
> Catarina
>
>
> 2013/10/16 davide risso <risso.davide at gmail.com>
>>
>> Hi Catarina,
>>
>> our within-sample normalization is meant to normalize for one factor
>> at the time.
>> In our paper (http://www.biomedcentral.com/1471-2105/12/480/) we
>> showed that in our data GC-content effect are possibly
>> library-specific and can bias differential expression, while we didn't
>> see such a library-specific effect for gene length. Hence, we propose
>> to normalize for GC-content and not for length.
>>
>> If you want to normalize for both GC-content and length, I suggest to
>> have a look at the cqn normalization
>> (http://bioconductor.org/packages/release/bioc/html/cqn.html) that, if
>> I remember correctly, accounts for both effects.
>>
>> I also suggest to carefully "look" at the data, e.g. with the EDASeq
>> functions biasPlot and biasBoxplot to see if you need to normalize for
>> GC-content and/or length effects, because this may vary a lot across
>> datasets.
>>
>> Best regards,
>> Davide
>>
>> On Thu, Oct 10, 2013 at 11:05 AM, Catarina Almeida
>> <catarina.fa at gmail.com> wrote:
>> > Dear all,
>> >
>> > I'm using EDASeq to normalize my RNA-seq data.
>> >
>> > But I'm having some trouble understanding how to normalize for gc and
>> > for
>> > length... I got the idea that I needed to do it separately, like this:
>> >
>> > # within and between lane normalization for GC #
>> > dataWithinGC2 <- withinLaneNormalization(data,"gc",which="full")
>> > dataNormGC2 <- betweenLaneNormalization(dataWithinGC,which="full")
>> >
>> > # within and between lane normalization for length ##
>> > dataWithinLength <- withinLaneNormalization(data,"length",which="full")
>> > dataNormLength <-
>> > betweenLaneNormalization(dataWithinLength,which="full")
>> >
>> > Am I thinking right? Or should I within-normalize my data for both GC
>> > and
>> > length, like this:
>> > dataWithin <- withinLaneNormalization(data,"length",which="full")
>> > dataWithin <- withinLaneNormalization(dataWithin,"gc",which="full")
>> > dataNorm   <- betweenLaneNormalization(dataWithin,which="full")
>> >
>> > Any help is much appreciated!
>> > C
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> --
>> Davide Risso, PhD
>> Post Doctoral Scholar
>> Department of Statistics
>> University of California, Berkeley
>> 344 Li Ka Shing Center, #3370
>> Berkeley, CA 94720-3370
>> E-mail: davide.risso at berkeley.edu
>
>



-- 
Davide Risso, PhD
Post Doctoral Scholar
Department of Statistics
University of California, Berkeley
344 Li Ka Shing Center, #3370
Berkeley, CA 94720-3370
E-mail: davide.risso at berkeley.edu



More information about the Bioconductor mailing list