[BioC] RNASeq: normalization issues

ywchen at jimmy.harvard.edu ywchen at jimmy.harvard.edu
Sun May 1 16:26:14 CEST 2011


Hi Wolfgang,

Thanks for the note. I understand it is more statistically sound to work
on the count-level data before assessing DGE. However, it says in Wei's
original email to João that  "Or you can try quantile between-sample
normalization (read counts should be adjusted by gene length first), which
performs normalization across samples. You can try all these in edgeR
package. From my experience, I actually found the quantile method
performed better for my RNA-seq data."

Wei also mentioned some empirical evidence of the superiority of quantile
normalization on the data from MAQC project(calibrated using golden
standard qPCR data).

I guess I may have some misunderstanding of the message there.

Yiwen
> Hi Yiwen
>
> gene length adjustment usually does not make sense at this stage of the
> analysis (before assessing significance of differential expression), as
> it eliminates the information on count numbers, which is important for
> assessing significance in the low count range.
>
> It may or may not make sense at a later point of the analysis (after
> assessing significance).
>
> 	Best wishes
> 	Wolfgang
>
>
> Il May/1/11 6:44 AM, ywchen at jimmy.harvard.edu ha scritto:
>> Hi Wei,
>>
>> Could you elaborate on how to appropriately do gene-length-adjusted
>> quantile normalization in edgeR? The "quantile normalization" option in
>> calcNormFactors function does not seem to take into account the gene
>> length.
>>
>> Thanks.
>> Yiwen
>>> Hi João:
>>>
>>> 	Maybe you can try different normalization methods for your data to see
>>> which one looks better. How to best normalize RNA-seq data is still of
>>> much debate at this stage.
>>>
>>> 	You can try scaling methods like TMM, RPKM, or 75th percentile, which
>>> as
>>> you said normalize data within samples. Or you can try quantile
>>> between-sample normalization (read counts should be adjusted by gene
>>> length first), which performs normalization across samples. You can try
>>> all these in edgeR package.
>>>
>>> 	From my experience, I actually found the quantile method performed
>>> better
>>> for my RNA-seq data. I used general linear model and likelihood ratio
>>> test in edgeR in my analysis.
>>>
>>> 	Hope this helps.
>>>
>>> Cheers,
>>> Wei
>>>
>>> On Apr 28, 2011, at 7:36 PM, João Moura wrote:
>>>
>>>> Dear all,
>>>>
>>>>
>>>> Until now I was doing RNAseq DE analysis and to do that I understand
>>>> that
>>>> normalization issues only matter inside samples, because one can
>>>> assume
>>>> the
>>>> length/content biases will cancel out when comparing same genes in
>>>> different
>>>> samples.
>>>> Although, I'm now trying to compare correlation of different genes and
>>>> so,
>>>> this biases should be taken into account - for this is there any
>>>> better
>>>> method than RPKM?
>>>>
>>>> My main doubt is if I should also take into acount the biases inside
>>>> samples
>>>> and to do that is there any better approach then TMM by Robinson and
>>>> Oshlack
>>>> [2010]?
>>>>
>>>> Thank you all,
>>>> --
>>>> João Moura
>>>>
>>>> 	[[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and
>>> intend...{{dropped:6}}
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
>
>
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list