[BioC] Filtering out tags with low counts in DESeq and EgdeR?
Simon Anders
anders at embl.de
Sat May 21 11:02:46 CEST 2011
Hi Xiaohui
I agree thatit is worrying to get so different results from your two
approaches of using DESeq. Here are a few suggestion how you might
investigate this (and I'd be eager to hear about your findings):
- Bourgen et al. (PNAS, 2010, 107:9546) have studied how pre-filtering
affects the validity and power of a test. They stress that it is
important that the filter is blind to the sample labels (actually: even
permutation invariant). So what you do here is not statistically sound:
> filter=dat[rowSums(dat[,group1]>= 8) | rowSums(dat[,group2]>= 8), ]
Try instead something like:
filter=dat[rowSums(dat) >= 16, ]
- How does your filter affect the variance functions? Do the plots
generated by 'scvPlot()' differ between the filtered and the unfiltered
data set?
- If so, are the hits that you get at expression strength were the
variance functions differ? Are they at the low end, i.e., where the
filter made changes?
- Have you tried what happens if you filter after estimating variance?
The raw p values should be the same as without filtering, but the
adjusted p values might get better.
To be honest, I'm currently a bit at a loss which one is more correct:
Filtering before or after variance estimation. Let's hear what other
people on the list think.
> 2. For EdgeR
DESeq and edgeR are sufficiently similar that any correct answer
regarding filtering should apply to both.
> 2) I got 800 DE genes with p.value<0.1, but got 0 DE genes after adjusting p.value, is this possible? Then, can I used the *unadjusted* p.value to get DE genes?
> To adjust pvalue, I used: nde.adjust=sum(p.adjust(de.p, method = "BH")< 0.05)
Of course, this is possible. (Read up on the "multiple hypothesis
testing problem" if this is unclear to you.) Not also, though, that you
used an FDR of .1 in your DESeq code but of .05 here.
Simon
More information about the Bioconductor
mailing list