[BioC] DESeq adjusted pvalue calculation / filtering data
Simon Anders
anders at embl.de
Fri Nov 25 20:05:39 CET 2011
Dear Markus,
there are several questions in your mail; I try to answer them separately.
1. Storey's qvalues: While, technically, the applicability of Storey's
method might be a bit more narrow that of Benjamini and Hochberg's,
within transcriptomics both are usually equally applicable, and in,
Storey's does give more results.
Internally, DESeq calculates the adjusted p values with something like
res$padj <- p.adjust( res$pval, method="BH" )
You can also convert the raw p values (res$pval) yourself with Storey's
package if you have it installed. Beware that it does not handle NAs
well, you may need to take out the NA p values and put them back in.
2. Independent filtering: In the newest version of the DESeq voignette,
we have added a section on independent filtering. Removing, e.g., all
genes with, say, an average count below 10 does give you some extra hits.
3. The real reason that you have so few hits is your lack of replicates.
In this situation, DESeq reports by design only those hits that are
strikingly obvious, and doing otherwise wih a sound analysis method is
impossible. You cannot expect to get useful results with a flawed
experimental design -- and while the two points above might give you a
few extra hit, you are unlikely to get usable result without fixing your
experiment.
4. Sequencing depth: Remember that it is the total number of counts per
gene and _condition_ (not: sample) that gives you power for weakly
expressed genes, and the number of replicates that gives your power for
the strongly expressed genes. Hence, whenever practically feasible, it
is always better to sequence many biological replicate samples to
moderate depth than to sequence a few samples very deeply. (Of course,
even if replicates are difficult to obtain, two replicates is the
minimum. Doing an experiment without that is pointless.)
Simon
More information about the Bioconductor
mailing list