[BioC] number of genes for DESeq analysis

Simon Anders anders at embl.de
Thu Feb 16 18:34:53 CET 2012


Dear Vladimir

> I have carried out an RNAseq experiment with 4 conditions, 2
> biological replicates of each. In the moment, I am interested in how
> my conditions differ in terms of expression of a subset of 36 genes.
> The idea is to count only the reads, which correspond to those 36
> genes and use this piece of data for the analysis of their
> differential expression across the conditions. Will this approach be
> valid? What is the minimum number of genes required by the statistical
> model implemented in DESeq? I apologize if the question are too naive.

What is wrong with doing the analysis for all genes, and then looking 
only at those that you are interested in?

For the dispersion estimation, you should use all available genes. 
However, at least if you have really selected the list of 36 genes prior 
to your experiment or at least independently of your RNA-Seq data and do 
not intend to look at any further genes to decide on the hypothesis you 
currently have in mind, you might be justified at performing the 
multiple testing adjustment on the raw p values of only those 36 genes, 
which would surely improve your power. To do so, subset them from the 
"pvalue" column of the final result and hand them to the 'p.adjust' 
function (with 'method="BH"').

   Simon



More information about the Bioconductor mailing list