[BioC] DEseq2 metagenomic analysis without replicates

Thu Jan 16 16:02:23 CET 2014

Hi Kristina

I'll write a bit a longer reply, to advertise a kind of paradigm shift
that we are now exploring in DESeq2.

First a note to other readers of this thread: Kristina has sent me a
private mail explaining the experiment in more detail. To sum this up
without disclosing anything, the three "treatments" labelled "none",
"life" and "dead" are not so much different treatments applied to one
kind of sample but rather three different (actually: very different)
manners in which the samples were collected, which are hence expected to
capture quite different subsets of the species that are present at the
sample collection spots.

Therefore, it is no surprise that observed OTU abundances differ so
greatly between the "treatments". Remember that the purpose of a test
for differential expression in its usual sense is to find gene for which
you can reject with confidence the null hypothesis that the gene is
_not_at_all_ affected by the treatment, i.e., that its abundance stays
exactly the same.

It is quite common that this specific null hypothesis turns out to be a
bit silly. All genes are connected by the cell's complicated regulatory
network, and hence, there typically might not be a single gene which is
not at least very slightly affected by the treatment. In essence, what a
significant p value hence really means is that the effect was big enough
that we can say with confidence, in which _direction_ the gene has
changed. Quite often, the effect strength required to establish the
direction of change is larger than the effect strength required to make
the change of biological interest, and therefore, one looks at p values.
If it is the other way round, one needs to also have a cut-off on fold
change or do a banded test (see below).

In your case, it is also clear a priori that no OTU will have the same
abundance in two "treatments", given that the different sample
collection approaches favour quite different species. Hence, it is no
surprise that at a reasonable cut-off on the adjusted p values (say,
0.1), nearly all OTUs are significant.

This means that you are in the favourable situation that you can take
the estimated differences between treatments pretty much at face value.
However, you want to know _how_precise they actually are, and for this,
DESeq2 now reports standard errors alongside all estimated log fold
changes. This data, and not the p values, is likely the more useful
output for you.

  Simon