[BioC] RNASeq: normalization issues

Wolfgang Huber whuber at embl.de
Sun May 1 10:13:19 CEST 2011


Hi Yiwen

gene length adjustment usually does not make sense at this stage of the 
analysis (before assessing significance of differential expression), as 
it eliminates the information on count numbers, which is important for 
assessing significance in the low count range.

It may or may not make sense at a later point of the analysis (after 
assessing significance).

	Best wishes
	Wolfgang


Il May/1/11 6:44 AM, ywchen at jimmy.harvard.edu ha scritto:
> Hi Wei,
>
> Could you elaborate on how to appropriately do gene-length-adjusted
> quantile normalization in edgeR? The "quantile normalization" option in
> calcNormFactors function does not seem to take into account the gene
> length.
>
> Thanks.
> Yiwen
>> Hi João:
>>
>> 	Maybe you can try different normalization methods for your data to see
>> which one looks better. How to best normalize RNA-seq data is still of
>> much debate at this stage.
>>
>> 	You can try scaling methods like TMM, RPKM, or 75th percentile, which as
>> you said normalize data within samples. Or you can try quantile
>> between-sample normalization (read counts should be adjusted by gene
>> length first), which performs normalization across samples. You can try
>> all these in edgeR package.
>>
>> 	From my experience, I actually found the quantile method performed better
>> for my RNA-seq data. I used general linear model and likelihood ratio
>> test in edgeR in my analysis.
>>
>> 	Hope this helps.
>>
>> Cheers,
>> Wei
>>
>> On Apr 28, 2011, at 7:36 PM, João Moura wrote:
>>
>>> Dear all,
>>>
>>>
>>> Until now I was doing RNAseq DE analysis and to do that I understand
>>> that
>>> normalization issues only matter inside samples, because one can assume
>>> the
>>> length/content biases will cancel out when comparing same genes in
>>> different
>>> samples.
>>> Although, I'm now trying to compare correlation of different genes and
>>> so,
>>> this biases should be taken into account - for this is there any better
>>> method than RPKM?
>>>
>>> My main doubt is if I should also take into acount the biases inside
>>> samples
>>> and to do that is there any better approach then TMM by Robinson and
>>> Oshlack
>>> [2010]?
>>>
>>> Thank you all,
>>> --
>>> João Moura
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intend...{{dropped:6}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list