Got it, thanks for clarifying and for the suggestion.
I do have another question though! Hopefully I can make it clear.

For instance, for within-lane normalization, what parameter from do we
chose from "upper" "loess" median" and "full" for
 wich= ""
when normalizing?

I understand how they work and I understand that "full" seems a much more
accurate way to normalize. What I fail to understand is the criteria used
to chose between full, upper, median and loess.
Does it depend on my experience? Is it a question of what method gives the
best normalized plots?

I've read your article and two tutorials I found on normalizing data (and
also Bullard's 2010, on the between-lane normalization approaches) but I'm
afraid I am still confused with this.

Thanks in advance!
Catarina


2013/10/16 davide risso <risso.davide@gmail.com>

> Hi Catarina,
>
> our within-sample normalization is meant to normalize for one factor
> at the time.
> In our paper (http://www.biomedcentral.com/1471-2105/12/480/) we
> showed that in our data GC-content effect are possibly
> library-specific and can bias differential expression, while we didn't
> see such a library-specific effect for gene length. Hence, we propose
> to normalize for GC-content and not for length.
>
> If you want to normalize for both GC-content and length, I suggest to
> have a look at the cqn normalization
> (http://bioconductor.org/packages/release/bioc/html/cqn.html) that, if
> I remember correctly, accounts for both effects.
>
> I also suggest to carefully "look" at the data, e.g. with the EDASeq
> functions biasPlot and biasBoxplot to see if you need to normalize for
> GC-content and/or length effects, because this may vary a lot across
> datasets.
>
> Best regards,
> Davide
>
> On Thu, Oct 10, 2013 at 11:05 AM, Catarina Almeida
> <catarina.fa@gmail.com> wrote:
> > Dear all,
> >
> > I'm using EDASeq to normalize my RNA-seq data.
> >
> > But I'm having some trouble understanding how to normalize for gc and for
> > length... I got the idea that I needed to do it separately, like this:
> >
> > # within and between lane normalization for GC #
> > dataWithinGC2 <- withinLaneNormalization(data,"gc",which="full")
> > dataNormGC2 <- betweenLaneNormalization(dataWithinGC,which="full")
> >
> > # within and between lane normalization for length ##
> > dataWithinLength <- withinLaneNormalization(data,"length",which="full")
> > dataNormLength <- betweenLaneNormalization(dataWithinLength,which="full")
> >
> > Am I thinking right? Or should I within-normalize my data for both GC and
> > length, like this:
> > dataWithin <- withinLaneNormalization(data,"length",which="full")
> > dataWithin <- withinLaneNormalization(dataWithin,"gc",which="full")
> > dataNorm   <- betweenLaneNormalization(dataWithin,which="full")
> >
> > Any help is much appreciated!
> > C
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> Davide Risso, PhD
> Post Doctoral Scholar
> Department of Statistics
> University of California, Berkeley
> 344 Li Ka Shing Center, #3370
> Berkeley, CA 94720-3370
> E-mail: davide.risso@berkeley.edu
>

	[[alternative HTML version deleted]]

