[BioC] Filtering out tags with low counts in DESeq and EgdeR?

Sat May 21 11:02:46 CEST 2011

Hi Xiaohui

I agree thatit is worrying to get so different results from your two 
approaches of using DESeq. Here are a few suggestion how you might 
investigate this (and I'd be eager to hear about your findings):

- Bourgen et al. (PNAS, 2010, 107:9546) have studied how pre-filtering 
affects the validity and power of a test. They stress that it is 
important that the filter is blind to the sample labels (actually: even 
permutation invariant). So what you do here is not statistically sound:

 > filter=dat[rowSums(dat[,group1]>= 8) | rowSums(dat[,group2]>= 8), ]

Try instead something like:

filter=dat[rowSums(dat) >= 16, ]

- How does your filter affect the variance functions? Do the plots 
generated by 'scvPlot()' differ between the filtered and the unfiltered 
data set?

- If so, are the hits that you get at expression strength were the 
variance functions differ? Are they at the low end, i.e., where the 
filter made changes?

- Have you tried what happens if you filter after estimating variance? 
The raw p values should be the same as without filtering, but the 
adjusted p values might get better.

To be honest, I'm currently a  bit at a loss which one is more correct: 
Filtering before or after variance estimation. Let's hear what other 
people on the list think.

> 2. For EdgeR

DESeq and edgeR are sufficiently similar that any correct answer 
regarding filtering should apply to both.

> 2) I got 800 DE genes with p.value<0.1, but got 0 DE genes after adjusting p.value, is this possible? Then, can I used the *unadjusted* p.value to get DE genes?
> To adjust pvalue, I used: nde.adjust=sum(p.adjust(de.p, method = "BH")<  0.05)

Of course, this is possible. (Read up on the "multiple hypothesis 
testing problem" if this is unclear to you.) Not also, though, that you 
used an FDR of .1 in your DESeq code but of .05 here.

   Simon