[BioC] edgeR normalization factors
Naomi Altman
naomi at stat.psu.edu
Tue Jun 29 17:20:44 CEST 2010
Of course Mark is correct for DE analysis. What
I should have said is that the normalized Library
Size should be used for DE. And this is certainly covered in the paper.
For clustering, I think you probably will need to
change the data - but it depends on what you are
clustering and the distance measure.
--Naomi
At 04:21 AM 6/29/2010, Mark Robinson wrote:
>(Travelling so this is a rather quick response)
>
>I disagree with Naomi.
>
>First, for a differential expression analysis, we prefer to use the counts
>as is, and use the normalization factors as offsets in the statistical
>modeling. So, these normalization factors actually DO NOT change the data
>(this is unlike microarray data normalization).
>
>Second, for clustering, visualization etc. you may want to calculate a
>normalized expression value. Using the normalization factors that you
>calculate using calcNormFactors() multiplied by the library size (See
>Section 6 of the manual), you could DIVIDE your raw counts by this number
>for each library. Maybe also multiple by 10M so you have counts per 10M?
>
>I think what Naomi is talking about (highly expressed genes depressing the
>expression of other genes) is covered in our paper:
>http://genomebiology.com/2010/11/3/R25
>
>Cheers,
>Mark
>
> > Multiply.
> >
> > And yes, you should use the normalized data for
> > DE and clustering. Otherwise, highly expressing
> > genes in your sample will depress the expression
> > of other genes relative to the size of the
> > library, inducing spurious "differential"
> > expression. I have been simulating data to try to understand this better.
> >
> > --Naomi
> >
> > At 11:19 PM 6/27/2010, çå wrote:
> >>Hello,
> >>Â
> >>I have a question about using TMM normalization
> >>factors. I want to modify the count for each
> >>gene after normalization. Should I just need to
> >>divide the count of each gene by the
> >>normalization factor for its library? Then, I
> >>may use the normalized data for DE
> >>analysis and other further analysis (e.g. clustering).
> >>
> >>Thanks a lot,
> >>Zhe
> >>
> >>
> >>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >>_______________________________________________
> >>Bioconductor mailing list
> >>Bioconductor at stat.math.ethz.ch
> >>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>Search the archives:
> >>http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > Naomi S. Altman 814-865-3791 (voice)
> > Associate Professor
> > Dept. of Statistics 814-863-7114 (fax)
> > Penn State University 814-865-1348 (Statistics)
> > University Park, PA 16802-2111
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
>
>
>______________________________________________________________________
>The information in this email is confidential
>and intended solely for the addressee.
>You must not disclose, forward, print or use it
>without the permission of the sender.
>______________________________________________________________________
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list