[BioC] DESeq questions
Simon Anders
anders at embl.de
Tue Dec 7 15:25:11 CET 2010
Hi Shrey
> I have RNA-seq count data for 30,400 genes across 6 conditions (3 replicates
> per condition). I was trying different normalization methods and then test
> for differentially expressed genes between conditions. How to test whether
> estimateSizeFactors() and estimateVarianceFunctions() does a good fit for my
> data? Also is there a way to test whether the normalization is good ? Any
> help is greatly appreciated,
To test whether the normalization (i.e., the size factor estimation)
worked fine, do an MA plot for a pair of samples and mark the size
factor log ratio with a horizontal line.
Here is a demonstration with example data:
library( DESeq )
# Make some example data (or use your real data )
cds <- makeExampleCountDataSet( )
# estimate the size factors
cds <- estimateSizeFactors( cds )
# Choose two samples for which you want to check whether they are
# properly normalizae with respect to each other
s1 <- 1; s2 <- 2
# Make the MA plot, i.e., plot the log fold change between the sample
# against the mean of the log counts
plot(
( log10( counts(cds)[,s1] ) + log10( counts(cds)[,s2] ) )/2,
log10( counts(cds)[,s2] ) - log10( counts(cds)[,s1] ) )
# In this plot, the bulk of the genes which are not differentially
# expressed should scatter around a horizontal line in the middle.
# The position of this line should be given by the log ratio of
# the size factors. Mark the latter:
abline(
h=log10( sizeFactors(cds)[s2] ) - log10( sizeFactors(cds)[s1] ),
col="red" )
# Now, the red line should go right through the middle of the bulk of
# not differentially expressed genes.
I hope that helps
Simon
More information about the Bioconductor
mailing list