[BioC] RNAseq data normalization without differential expression

Michael Love michaelisaiahlove at gmail.com
Mon Jul 29 14:58:10 CEST 2013


hi Mete,

On Mon, Jul 29, 2013 at 3:22 AM, Mete Civelek <mcivelek at mednet.ucla.edu> wrote:
> Dear All,
>
> I have RNAseq counts for 400 human donors. This is a random sampling of human subjects from a population-wide study. I am not interested in differential expression of genes between certain groups. There are differences in sequence reads because of library size therefore I need to normalize the counts.
>
> I have been reading the postings on this list regarding the normalization methods in DEseq and edgeR. I looked at the reference manuals of both of these packages. I understand that they both use different normalization approaches. My understanding is that while both approaches use the sample information (i.e. whether they are from control or treatment condition) in order to create a list object as a first step, this information is not used in the normalization step but only in the differential expression analysis step. Is this correct?
>

Yes, this is true for DESeq/DESeq2. The transformations in DESeq2 have
an argument blind, which defaults to TRUE, which estimates the
dispersion for the transformation without using any information of the
experimental design.

It depends on what you want to do with the normalized data, but the
VST or rlog transformation should help you for instance cluster
samples or genes in a large data set, by stabilizing the variance
across the range of mean counts.

If there are large difference in library sizes, we recommend to use
rlogTransformation(). Furthermore, the rlog implementation in the
devel branch seems to perform qualitatively better than the one in the
release branch. The difference is that in the devel branch, the rlog
transformation uses the fitted dispersion values rather than the
shrunken dispersion estimates. This makes the rlog perform more like
the VST, and avoids squashing what could be large, true differences
across samples for high count genes.

Mike



More information about the Bioconductor mailing list