[BioC] RNASeq: normalization issues

Wei Shi shi at wehi.EDU.AU
Fri Apr 29 05:26:49 CEST 2011


Hi Fernando:

	We had some positive control genes which we know should be up-/down-regulated in one cell type compared to the other from previous RT-PCR experiments. The quantile method successfully detected all these control genes and gave them higher ranks in the list of differentially expressed genes compared to other normalization methods. You could certainly argue that this is a biased comparison, but when you do not know which method works best, the one which gives results more closer to your expectation is often preferred.

	My belief in the quantile method actually mainly came from a evaluation study using the RNA-seq data from MAQC project, in which expression levels of ~1000 genes were validated by RT-PCR. What I found was that the quantile normalized data had a better correlation with the PCR data, compared to other normalization methods. This work hasn't been published yet, but I am working on that.

Cheers,
Wei


On Apr 29, 2011, at 12:51 PM, Biase, Fernando wrote:

> Dr. Wei,
> 
> If  I may I ask. What criteria do you use to find out which normalization suits better your data?
> 
> thanks,
> Fernando
> 
> ________________________________________
> From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] On Behalf Of Wei Shi [shi at wehi.EDU.AU]
> Sent: Thursday, April 28, 2011 6:07 PM
> To: João Moura
> Cc: bioconductor at r-project.org list
> Subject: Re: [BioC] RNASeq: normalization issues
> 
> Hi João:
> 
>        Maybe you can try different normalization methods for your data to see which one looks better. How to best normalize RNA-seq data is still of much debate at this stage.
> 
>        You can try scaling methods like TMM, RPKM, or 75th percentile, which as you said normalize data within samples. Or you can try quantile between-sample normalization (read counts should be adjusted by gene length first), which performs normalization across samples. You can try all these in edgeR package.
> 
>        From my experience, I actually found the quantile method performed better for my RNA-seq data. I used general linear model and likelihood ratio test in edgeR in my analysis.
> 
>        Hope this helps.
> 
> Cheers,
> Wei
> 
> On Apr 28, 2011, at 7:36 PM, João Moura wrote:
> 
>> Dear all,
>> 
>> 
>> Until now I was doing RNAseq DE analysis and to do that I understand that
>> normalization issues only matter inside samples, because one can assume the
>> length/content biases will cancel out when comparing same genes in different
>> samples.
>> Although, I'm now trying to compare correlation of different genes and so,
>> this biases should be taken into account - for this is there any better
>> method than RPKM?
>> 
>> My main doubt is if I should also take into acount the biases inside samples
>> and to do that is there any better approach then TMM by Robinson and Oshlack
>> [2010]?
>> 
>> Thank you all,
>> --
>> João Moura
>> 
>>      [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:18}}



More information about the Bioconductor mailing list