[BioC] DESeq estimateDispersion query

Fri Jan 27 13:27:59 CET 2012

Hello List

I have a question about the use of DESeq to analyse RNA-Seq data from paired samples. 

Briefly my experimental design involves cells from 6 animals either infected or uninfected. I would like to use DESeq to examine differences between these groups taking the biological pairing into account. The data are very variable (I think - I've no experience with RNA-Seq data until now) with some animals showing very little response and others showing large movement for a given gene; some animals have low expression of a given gene while others have large counts.

I have used DESeq to simply look at the control v infected groups and, on FC this gives me biologically relevant data but my FDR never drops below 1. I am assuming that taking the animal pairing into account will help overcome the inherent variablilty in the dataset

Anyway my question relates to creating the estimateDispersions function in DESeq. I create my CountDataSet object following the directions in the DESeq vignette, passing in counts and a dataframe of factors (group = infection & description = animal). 

cds <- newCountDataSet( counts, conds )

> class(cds)
[1] "CountDataSet"
attr(,"package")
[1] "DESeq"

> cds
CountDataSet (storageMode: environment)
assayData: 12063 features, 12 samples 
  element names: counts 
protocolData: none
phenoData
  sampleNames: 2713:InfAF 2700:InfAF ... 2684:Cont (12 total)
  varLabels: sizeFactor group description
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:  

> pData(cds)
           sizeFactor group description
2713:InfAF  1.1611902 InfAF        2713
2700:InfAF  1.0896422 InfAF        2700
2700:Cont   1.0799056  Cont        2700
2721:Cont   0.9315077  Cont        2721
2721:InfAF  1.3688457 InfAF        2721
2713:Cont   1.6007580  Cont        2713
2714:InfAF  0.6519720 InfAF        2714
2714:Cont   0.9796116  Cont        2714
2689:Cont   1.0116211  Cont        2689
2689:InfAF  0.4251266 InfAF        2689
2684:InfAF  0.7910000 InfAF        2684
2684:Cont   2.2416324  Cont        2684

group = infection and description = sample i.e. animal

So, all this seems to work fine. However when I try to run:

cds <- estimateDispersions( cds )

I get the following error:

Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]),  : 
  'x' must be an array of at least two dimensions

I can't reproduce this error with the 'pasilla' dataset used in the vignette so it must be something to do with the way I'm creating my CountDataSet object. To me it looks just like the one in the vignette though!

Can anyone shed any light on this?

I have successfully used edgeR for paired sample analysis and this gives biologically relevant results along with good FDR. I was interested in the comparison with DESeq for my own education as much as anything else.

Thanks

iain

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8    
 [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8   
 [7] LC_PAPER=C                LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] annotate_1.32.1        GO.db_2.6.1            goseq_1.6.0           
 [4] geneLenDataBase_0.99.8 BiasedUrn_1.04         org.Bt.eg.db_2.6.4    
 [7] RSQLite_0.11.1         DBI_0.2-5              AnnotationDbi_1.16.11 
[10] edgeR_2.4.3            limma_3.10.2           DESeq_1.6.1           
[13] locfit_1.5-6           lattice_0.20-0         akima_0.5-7           
[16] Biobase_2.14.0        

loaded via a namespace (and not attached):
 [1] biomaRt_2.10.0        Biostrings_2.22.0     BSgenome_1.22.0      
 [4] genefilter_1.36.0     geneplotter_1.32.1    GenomicFeatures_1.6.7
 [7] GenomicRanges_1.6.6   grid_2.14.1           IRanges_1.12.5       
[10] Matrix_1.0-3          mgcv_1.7-13           nlme_3.1-103         
[13] RColorBrewer_1.0-5    RCurl_1.9-5           rtracklayer_1.14.4   
[16] splines_2.14.1        survival_2.36-10      tools_2.14.1         
[19] XML_3.9-2             xtable_1.6-0          zlibbioc_1.0.0       
>