[BioC] DESeq estimateDispersion query
Iain Gallagher
iaingallagher at btopenworld.com
Fri Jan 27 13:27:59 CET 2012
Hello List
I have a question about the use of DESeq to analyse RNA-Seq data from paired samples.
Briefly my experimental design involves cells from 6 animals either infected or uninfected. I would like to use DESeq to examine differences between these groups taking the biological pairing into account. The data are very variable (I think - I've no experience with RNA-Seq data until now) with some animals showing very little response and others showing large movement for a given gene; some animals have low expression of a given gene while others have large counts.
I have used DESeq to simply look at the control v infected groups and, on FC this gives me biologically relevant data but my FDR never drops below 1. I am assuming that taking the animal pairing into account will help overcome the inherent variablilty in the dataset
Anyway my question relates to creating the estimateDispersions function in DESeq. I create my CountDataSet object following the directions in the DESeq vignette, passing in counts and a dataframe of factors (group = infection & description = animal).
cds <- newCountDataSet( counts, conds )
> class(cds)
[1] "CountDataSet"
attr(,"package")
[1] "DESeq"
> cds
CountDataSet (storageMode: environment)
assayData: 12063 features, 12 samples
element names: counts
protocolData: none
phenoData
sampleNames: 2713:InfAF 2700:InfAF ... 2684:Cont (12 total)
varLabels: sizeFactor group description
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
> pData(cds)
sizeFactor group description
2713:InfAF 1.1611902 InfAF 2713
2700:InfAF 1.0896422 InfAF 2700
2700:Cont 1.0799056 Cont 2700
2721:Cont 0.9315077 Cont 2721
2721:InfAF 1.3688457 InfAF 2721
2713:Cont 1.6007580 Cont 2713
2714:InfAF 0.6519720 InfAF 2714
2714:Cont 0.9796116 Cont 2714
2689:Cont 1.0116211 Cont 2689
2689:InfAF 0.4251266 InfAF 2689
2684:InfAF 0.7910000 InfAF 2684
2684:Cont 2.2416324 Cont 2684
group = infection and description = sample i.e. animal
So, all this seems to work fine. However when I try to run:
cds <- estimateDispersions( cds )
I get the following error:
Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]), :
'x' must be an array of at least two dimensions
I can't reproduce this error with the 'pasilla' dataset used in the vignette so it must be something to do with the way I'm creating my CountDataSet object. To me it looks just like the one in the vignette though!
Can anyone shed any light on this?
I have successfully used edgeR for paired sample analysis and this gives biologically relevant results along with good FDR. I was interested in the comparison with DESeq for my own education as much as anything else.
Thanks
iain
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] annotate_1.32.1 GO.db_2.6.1 goseq_1.6.0
[4] geneLenDataBase_0.99.8 BiasedUrn_1.04 org.Bt.eg.db_2.6.4
[7] RSQLite_0.11.1 DBI_0.2-5 AnnotationDbi_1.16.11
[10] edgeR_2.4.3 limma_3.10.2 DESeq_1.6.1
[13] locfit_1.5-6 lattice_0.20-0 akima_0.5-7
[16] Biobase_2.14.0
loaded via a namespace (and not attached):
[1] biomaRt_2.10.0 Biostrings_2.22.0 BSgenome_1.22.0
[4] genefilter_1.36.0 geneplotter_1.32.1 GenomicFeatures_1.6.7
[7] GenomicRanges_1.6.6 grid_2.14.1 IRanges_1.12.5
[10] Matrix_1.0-3 mgcv_1.7-13 nlme_3.1-103
[13] RColorBrewer_1.0-5 RCurl_1.9-5 rtracklayer_1.14.4
[16] splines_2.14.1 survival_2.36-10 tools_2.14.1
[19] XML_3.9-2 xtable_1.6-0 zlibbioc_1.0.0
>
More information about the Bioconductor
mailing list