[BioC] DEseq2 metagenomic analysis without replicates

Kristina Fontanez [guest] guest at bioconductor.org
Mon Jan 13 20:08:23 CET 2014


I am working with a metagenomic dataset in DESeq2 that does not have true biological replicates but has multiple factors and levels. Due to the difficulty in obtaining these samples, replicates were not an option. The study design consists of seawater microbes collected at 4 depths. For each depth, there are 3 treatments (live, dead and none). I expect there to be true biological differences in abundance both among depths and among treatments. So, combining the samples across depths is not a great option because I expect there to be some variation in abundance of taxa among depths. The same holds true for combining samples across treatments. However, if I have to choose one I would expect there to be less variation among depths than among treatments. 

The goal is to assess the differences in abundance among treatments and among depths. Ultimately, I would like to create relative abundance heat maps and bar charts displaying the taxa which are differentially abundant among treatments and among depths. I would also like to cluster the samples using PCoA or nMDS to see the effects of treatment and depth on sample similarity. 

Is it possible to normalize these data (using variance-stabilized normalization or regularized log transformation) without choosing a study design? In my perusing of the mailing list, one suggestion was to use a design ~ 1 to estimate the dispersions so that the dispersions treat all samples as replicates. In my case, that is not an ideal choice due to the expected variation in abundance of taxa among depths and treatments.

So, I see two options for moving forward that I would like feedback on:

1) Make relative abundance heat maps and bar charts based on simple counts/library size proportions. Later, to do differential abundance tests, pool samples across depths to compare the 3 treatments (4 replicates per treatment), calculating the dispersion using ~Treatment as the study design.

2) Normalize count data using a design ~1 option in DEseq2, export normalized counts. As far I can tell - there isn’t a documented procedure to do this in manual, vignettes or on the mailing list. The main problem here is that I can’t figure out a way to export the regularized log transformed counts.

One approach to (2) for the regularized log transformation is below and the variable Deptreat_rlog contains the regularized log transformed count data which is the same size as the original dataset (but the “counts” matrix is inaccessible):

> design(Deptreat)<- ~1
> Deptreat
class: DESeqDataSet
dim: 1064 12
assays(1): counts
rownames(1064): OTU1 OTU2 ... OTU1063 OTU1064
rowData metadata column names(0):
colnames(12): HD5_150L HD5_150D ... HD5_500D HD9_SW_500_22
colData names(5): Treatment Depth Cmg.m2.day Nmg.m2.day Pmg.m2.day 
> Deptreat<-estimateSizeFactors(Deptreat)  
> Deptreat<-estimateDispersions(Deptreat,fitType="local")
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates 
> plotDispEsts(Deptreat)
> Deptreat_rlog=rlogData(Deptreat)
> dim(Deptreat_rlog)
[1] 1064   12
> assay(Deptreat_rlog)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘assay’ for signature ‘"matrix", "missing"’
> traceback()
3: stop(gettextf("unable to find an inherited method for function %s for signature %s", 
       sQuote(fdef at generic), sQuote(cnames)), domain = NA)
2: (function (classes, fdef, mtable) 
       methods <- .findInheritedMethods(classes, fdef, mtable)
       if (length(methods) == 1L) 
       else if (length(methods) == 0L) {
           cnames <- paste0("\"", sapply(classes, as.character), 
               "\"", collapse = ", ")
           stop(gettextf("unable to find an inherited method for function %s for signature %s", 
               sQuote(fdef at generic), sQuote(cnames)), domain = NA)
       else stop("Internal error in finding inherited methods; didn't return a unique method", 
           domain = NA)
   })(list("matrix", "missing"), function (x, i, ...) 
   standardGeneric("assay"), <environment>)
1: assay(Deptreat_rlog)

Thank you for your insights!


 -- output of sessionInfo(): 

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] DESeq2_1.2.5            RcppArmadillo_0.3.920.1 Rcpp_0.10.6            
 [4] GenomicRanges_1.14.3    XVector_0.2.0           IRanges_1.20.5         
 [7] BiocGenerics_0.8.0      ggplot2_0.9.3.1         scales_0.2.3           
[10] phyloseq_1.7.9         

loaded via a namespace (and not attached):
 [1] ade4_1.5-2           annotate_1.40.0      AnnotationDbi_1.24.0
 [4] ape_3.0-11           Biobase_2.22.0       biom_0.3.10         
 [7] Biostrings_2.30.1    cluster_1.14.4       codetools_0.2-8     
[10] colorspace_1.2-4     DBI_0.2-7            dichromat_2.0-0     
[13] digest_0.6.3         foreach_1.4.1        genefilter_1.44.0   
[16] grid_3.0.2           gtable_0.1.2         igraph_0.6.6        
[19] iterators_1.0.6      labeling_0.2         lattice_0.20-24     
[22] locfit_1.5-9.1       MASS_7.3-29          Matrix_1.1-0        
[25] multtest_2.18.0      munsell_0.4.2        nlme_3.1-113        
[28] permute_0.7-0        plyr_1.8             proto_0.3-10        
[31] RColorBrewer_1.0-5   reshape2_1.2.2       RJSONIO_1.0-3       
[34] RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2        
[37] stringr_0.6.2        survival_2.37-4      tools_3.0.2         
[40] vegan_2.0-9          XML_3.95-0.2         xtable_1.7-1        

Sent via the guest posting facility at bioconductor.org.

More information about the Bioconductor mailing list