[BioC] DESeq adjusted pvalue calculation / filtering data

Fri Nov 25 20:05:39 CET 2011

Dear Markus,

there are several questions in your mail; I try to answer them separately.

1. Storey's qvalues: While, technically, the applicability of Storey's 
method might be a bit more narrow that of Benjamini and Hochberg's, 
within transcriptomics both are usually equally applicable, and in, 
Storey's does give more results.

Internally, DESeq calculates the adjusted p values with something like

   res$padj <- p.adjust( res$pval, method="BH" )

You can also convert the raw p values (res$pval) yourself with Storey's 
package if you have it installed. Beware that it does not handle NAs 
well, you may need to take out the NA p values and put them back in.

2. Independent filtering: In the newest version of the DESeq voignette, 
we have added a section on independent filtering. Removing, e.g., all 
genes with, say, an average count below 10 does give you some extra hits.

3. The real reason that you have so few hits is your lack of replicates. 
In this situation, DESeq reports by design only those hits that are 
strikingly obvious, and doing otherwise wih a sound analysis method is 
impossible. You cannot expect to get useful results with a flawed 
experimental design -- and while the two points above might give you a 
few extra hit, you are unlikely to get usable result without fixing your 
experiment.

4. Sequencing depth: Remember that it is the total number of counts per 
gene and _condition_ (not: sample) that gives you power for weakly 
expressed genes, and the number of replicates that gives your power for 
the strongly expressed genes. Hence, whenever practically feasible, it 
is always better to sequence many biological replicate samples to 
moderate depth than to sequence a few samples very deeply. (Of course, 
even if replicates are difficult to obtain, two replicates is the 
minimum. Doing an experiment without that is pointless.)

   Simon