[BioC] edgeR normalization factors
Wolfgang Huber
whuber at embl.de
Tue Jun 29 17:29:16 CEST 2010
Zhe,
for clustering and similar endeavours, transforming the data to a
"logarithm-like" variance-stabilised scale is useful. See e.g. chapter 7
"Sample clustering" of the vignette of the DESeq package.
For differential expression, I agree with Mark that you want to use the
counts as is, and use the normalization factors as parameters in the
statistical modeling.
Wolfgang
On Jun/29/10 10:21 AM, Mark Robinson wrote:
>
> (Travelling so this is a rather quick response)
>
> I disagree with Naomi.
>
> First, for a differential expression analysis, we prefer to use the counts
> as is, and use the normalization factors as offsets in the statistical
> modeling. So, these normalization factors actually DO NOT change the data
> (this is unlike microarray data normalization).
>
> Second, for clustering, visualization etc. you may want to calculate a
> normalized expression value. Using the normalization factors that you
> calculate using calcNormFactors() multiplied by the library size (See
> Section 6 of the manual), you could DIVIDE your raw counts by this number
> for each library. Maybe also multiple by 10M so you have counts per 10M?
>
> I think what Naomi is talking about (highly expressed genes depressing the
> expression of other genes) is covered in our paper:
> http://genomebiology.com/2010/11/3/R25
>
> Cheers,
> Mark
>
>> Multiply.
>>
>> And yes, you should use the normalized data for
>> DE and clustering. Otherwise, highly expressing
>> genes in your sample will depress the expression
>> of other genes relative to the size of the
>> library, inducing spurious "differential"
>> expression. I have been simulating data to try to understand this better.
>>
>> --Naomi
>>
>> At 11:19 PM 6/27/2010, çŽ‹å† wrote:
>>> Hello,
>>> Â
>>> I have a question about using TMM normalization
>>> factors. I want to modify the count for each
>>> gene after normalization. Should I just need to
>>> divide the count of each gene by the
>>> normalization factor for its library? Then, I
>>> may use the normalized data for DE
>>> analysis and other further analysis (e.g. clustering).
>>>
>>> Thanks a lot,
>>> Zhe
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> Naomi S. Altman 814-865-3791 (voice)
>> Associate Professor
>> Dept. of Statistics 814-863-7114 (fax)
>> Penn State University 814-865-1348 (Statistics)
>> University Park, PA 16802-2111
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:16}}
More information about the Bioconductor
mailing list