[BioC] Too many (?) differentially expressed genes - edgeR and DESeq
Simon Anders
anders at embl.de
Tue Jul 23 11:05:19 CEST 2013
Hi Darya
A dispersion of 0.02 is typical for cell line experiments, but only for
simple ones. For an experiment involving a full month of incubation, it
really seems quite low. On the other hand, you do have quite drastic
changes: very many genes change more the 32-fold (5 log2 units on the MA
plot), and they would still be significant even if you had a higher
dispersion estimate.
Given the dramatic changes in phenotype that one sees in
differentiation, the strong changes in gene expression are not that
surprising. In the end, it seems entirely reasonable to me to say that
hardly any gene is at the same level in a stem cell as in a terminally
differentiated cell.
Remember that a significant p value only means that the gene's fold
change is not zero and that the observed _direction_ of change is likely
the true one. It says nothing about the magnitude of the change. You are
hence in a situation where you are no longer interested in _which_ genes
change (because the answer simply is: most of them), but in the strength
of the change: Which genes have changed dramatically, which genes
stringly and which have changed only a bit? Hence, you should now look
at fold changes rather than p values.
Using ordinary log2 fold change values can give you a misleading
picture: As you can see in the MA plot, weak genes seem to have the
strongest changes, but this is only an artifact due to the fact that for
weak genes, the fold-change estimates are more variable and hence more
likely to be exaggerated.
This is why we introduced "shrunken log2 fold changes" in DESeq2: they
give you a more realistic picture of the strength of changes across the
dynamic range. See the DESeq2 vignette, and especially this tutorial for
more explanations:
http://www.bioconductor.org/help/course-materials/2013/CSAMA2013/tuesday/afternoon/DESeq2_parathyroid.pdf
Simon
More information about the Bioconductor
mailing list