[BioC] DESeq questions

Simon Anders anders at embl.de
Tue Dec 7 15:25:11 CET 2010


Hi Shrey

> I have RNA-seq count data for 30,400 genes across 6 conditions (3 replicates
> per condition). I was trying different normalization methods and then test
> for differentially expressed genes between conditions. How to test whether
> estimateSizeFactors() and estimateVarianceFunctions() does a good fit for my
> data? Also is there a way to test whether the normalization is good ? Any
> help is greatly appreciated,

To test whether the normalization (i.e., the size factor estimation) 
worked fine, do an MA plot for a pair of samples and mark the size 
factor log ratio with a horizontal line.

Here is a demonstration with example data:


   library( DESeq )

   # Make some example data (or use your real data )
   cds <- makeExampleCountDataSet( )

   # estimate the size factors
   cds <- estimateSizeFactors( cds )

   # Choose two samples for which you want to check whether they are
   # properly normalizae with respect to each other
   s1 <- 1; s2 <- 2

   # Make the MA plot, i.e., plot the log fold change between the sample
   # against the mean of the log counts
   plot(
      ( log10( counts(cds)[,s1] ) + log10( counts(cds)[,s2] ) )/2,
      log10( counts(cds)[,s2] ) - log10( counts(cds)[,s1] ) )

   # In this plot, the bulk of the genes which are not differentially
   # expressed should scatter around a horizontal line in the middle.
   # The position of this line should be given by the log ratio of
   # the size factors. Mark the latter:
   abline(
      h=log10( sizeFactors(cds)[s2] ) - log10( sizeFactors(cds)[s1] ),
      col="red" )

   # Now, the red line should go right through the middle of the bulk of
   # not differentially expressed genes.

I hope that helps

   Simon



More information about the Bioconductor mailing list