[BioC] RNAseq data normalization without differential expression
Civelek, Mete
MCivelek at mednet.ucla.edu
Mon Jul 29 18:37:07 CEST 2013
Hi Mike,
Thank you for the suggestions. I will try out the log transformation function.
Mete
-------- Original message --------
From: Michael Love <michaelisaiahlove at gmail.com>
Date:
To: "Civelek, Mete" <MCivelek at mednet.ucla.edu>
Cc: bioconductor at r-project.org
Subject: Re: [BioC] RNAseq data normalization without differential expression
hi Mete,
On Mon, Jul 29, 2013 at 3:22 AM, Mete Civelek <mcivelek at mednet.ucla.edu> wrote:
> Dear All,
>
> I have RNAseq counts for 400 human donors. This is a random sampling of human subjects from a population-wide study. I am not interested in differential expression of genes between certain groups. There are differences in sequence reads because of library size therefore I need to normalize the counts.
>
> I have been reading the postings on this list regarding the normalization methods in DEseq and edgeR. I looked at the reference manuals of both of these packages. I understand that they both use different normalization approaches. My understanding is that while both approaches use the sample information (i.e. whether they are from control or treatment condition) in order to create a list object as a first step, this information is not used in the normalization step but only in the differential expression analysis step. Is this correct?
>
Yes, this is true for DESeq/DESeq2. The transformations in DESeq2 have
an argument blind, which defaults to TRUE, which estimates the
dispersion for the transformation without using any information of the
experimental design.
It depends on what you want to do with the normalized data, but the
VST or rlog transformation should help you for instance cluster
samples or genes in a large data set, by stabilizing the variance
across the range of mean counts.
If there are large difference in library sizes, we recommend to use
rlogTransformation(). Furthermore, the rlog implementation in the
devel branch seems to perform qualitatively better than the one in the
release branch. The difference is that in the devel branch, the rlog
transformation uses the fitted dispersion values rather than the
shrunken dispersion estimates. This makes the rlog perform more like
the VST, and avoids squashing what could be large, true differences
across samples for high count genes.
Mike
________________________________
IMPORTANT WARNING: This email (and any attachments) is o...{{dropped:9}}
More information about the Bioconductor
mailing list