[BioC] DEseq2 metagenomic analysis without replicates
Kristina M Fontanez
fontanez at mit.edu
Tue Jan 14 19:10:55 CET 2014
Simon-
Thank you again for your help answering all of my questions. I really appreciate the time you’ve taken on this problem.
For the dead vs seawater comparison, 111/ 344 with p-adjusted values <1e-5 have log2Fold changes of > 3 or < -3. So, if I use that as my significance threshold - that’s still a lot of taxa. I have attached the MAplot created by:
> plotMA(jam,lfcColname="TREATMENT_Dead_vs_None",pvalCutoff=0.00001,main="Design ~Treatment+Depth")
Next, I tried your suggestion of just doing ~Treatment and I found 123 taxa with p-adjusted values <1e-5 have log2Fold changes of > 3 or < -3.
> jes
class: DESeqDataSet
dim: 2151 12
exptData(0):
assays(1): counts
rownames(2151): OTU1 OTU2 ... OTU2150 OTU2151
rowData metadata column names(0):
colnames(12): HD5_150L HD5_150D ... HD9_SW_300_22 HD9_SW_500_22
colData names(5): DEPTH TREATMENT Cmg.m2.day Nmg.m2.day Pmg.m2.day
> design(jes)
~TREATMENT
> colData(jes)$TREATMENT<-relevel(colData(jes)$TREATMENT,"None")
> jes<-estimateSizeFactors(jes)
> jes<-estimateDispersions(jes,fitType="local")
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> jes<-nbinomWaldTest(jes)
> resultsNames(jes)
[1] "Intercept" "TREATMENT_Dead_vs_None" "TREATMENT_Live_vs_None"
> DvsN_clean<-results(jes,"TREATMENT_Dead_vs_None",independentFiltering=TRUE)
> DvsN_clean<-DvsN_clean[order(DvsN_clean$padj,na.last=NA),]
> head(DvsN_clean)
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
OTU1608 234.50106 5.150802 0.4392783 11.72560 9.423191e-32 1.897831e-28
OTU1643 592.54806 5.594136 0.4892074 11.43510 2.792133e-30 2.811678e-27
OTU808 908.49633 -4.542030 0.4091021 -11.10244 1.220707e-28 8.195015e-26
OTU1576 216.68941 4.960188 0.4562776 10.87099 1.584772e-27 7.979329e-25
OTU1574 222.80627 5.204206 0.4836326 10.76066 5.279121e-27 2.126430e-24
OTU52 82.98462 -3.508510 0.3291026 -10.66084 1.551899e-26 5.209207e-24
> write.csv(DvsN_clean,file="Dead_vs_Seawater.Treat.iF_true.cook_true.csv")
> plotMA(jes,lfcColname="TREATMENT_Dead_vs_None",pvalCutoff=0.00001,main="Design ~Treatment")
At this point, it looks like including the effect of Depth reduces the total number of taxa under my significance threshold, but not by much. It seems like I need to decide what log2fold changes are biologically meaningful rather than relying on the p-adjusted values as a guide.
Thanks,
Kristina
------------------------------------------------------------------
Kristina Fontanez, Postdoctoral Fellow
fontanez at mit.edu<mailto:fontanez at mit.edu>
Massachusetts Institute of Technology
Department of Civil and Environmental Engineering
48-120E
15 Vassar Street
Cambridge, MA 02139
On Jan 14, 2014, at 12:18 PM, Simon Anders <anders at embl.de<mailto:anders at embl.de>> wrote:
On 14/01/14 18:03, Kristina M Fontanez wrote:
I agree that for the treatment effect, combining the dispersions across
depths is the way to proceed with these data. I think your reasoning for
depth as an ordered factor is sound and likely applies to the depth
profile I am working with. I attempted to follow your suggestion
(snipped below) to create a design that allows me to use depth as an
ordered factor. However, I found it confusing that the design is
~treatment + depth.
With depth as the second variable, doesn’t that mean
that the dispersions will be combined across treatments? I thought the
goal was the combine dispersions across depths so that the treatments
could be compared.
Oh, I was just one step ahead.
To see the effect of treatment, just leave out the depth factor, i.e.,
use "~ treatment". Only once you want to see the effect of depth, use "~
treatment + depth", with depth as a numeric variable (not an ordered
factor, I may have been a bit unclear there).
In the code below I used a ~TREATMENT+DEPTH design with depth as
numerical vectors and the “None” (seawater) treatment as the base level
of the treatment factor. My main concern is that the output results in
hundreds of differentially abundant taxa (344 OTUs for the Dead vs
Seawater comparison with p-adjusted values <1e-5). These results include
both independent Filtering and the cooksCutoff (by default). Given the
really high numbers I am inclined to think that a large number of these
are false positives and we have modeled the dispersion incorrectly. What
do you think of this assessment?
Maybe try again with ignoring "depth" altogether, even though, in
principle, including ti should not be wrong. Also have a look at the
estimated fold changes. Maybe they are significant but very small. An MA
plot would help here, and also allow to check the normalization.
I'm still puzzled what kind of treatment "live" and "dead" might be. I
find it hard to come up with a guess what kind of treatment might have
no effect on most taxa, which nevertheless has such a drastic name as
"dead". Can you explain a bit more about the biology? (By private mail,
in case you don't want to expose unpublished work on a public mailing list.)
Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MAplot_des_treatplusdepth.pdf
Type: application/pdf
Size: 332374 bytes
Desc: MAplot_des_treatplusdepth.pdf
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140114/ed754c3d/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MAplot_des_treatonly.pdf
Type: application/pdf
Size: 331704 bytes
Desc: MAplot_des_treatonly.pdf
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140114/ed754c3d/attachment-0003.pdf>
More information about the Bioconductor
mailing list