[BioC] DEXSeq - identical p-values for all exons in a gene
Pillman, Katherine (Health)
Katherine.Pillman at health.sa.gov.au
Fri Jan 17 01:32:04 CET 2014
Sorry, the problem is the issue title - that for every gene, I get one p-value for all of the exons in that gene. This didn't happen when I ran DEXSeq without specifying the model.
e.g. here is the top of my results table:
geneID exonID dispersion pvalue padjust meanBase log2fold(treated/control)
1/2-SBSRNA4 E001 0.11143032705478 0.910345433347734 1 30.9067618720574 0.015363670607322
1/2-SBSRNA4 E002 0.583064733453923 0.910345433347734 1 6.62219699771489 -0.131125949462748
1/2-SBSRNA4 E003 0.767713143685937 0.910345433347734 1 2.21699055323382 0.0731263380599324
A1BG E001 0.123758213037027 0.229711870771802 1 117.754224941408 0.249562974820941
A1BG E002 0.110317995296594 0.229711870771802 1 152.932234776212 -0.0091268347351396
A1BG E003 0.275216588214621 0.229711870771802 1 7.30724196291172 -0.4763859471178
A1BG E004 0.625237938618851 0.229711870771802 1 10.2174627774744 -0.623376548608748
A1BG E005 1e+08 NA NA 0 0.0978995620155234
A1BG E006 0.435477490574276 0.229711870771802 1 4.18241118313392 -0.0329868528499689
I've noticed the format for specifying the model seems to have changed - the original paper used something like:
formuladispersion <- count ~ sample + (condition + type) * exon
formula0 <- count ~ sample + type * exon + condition
formula1 <- count ~ sample + type * exon + condition * I(exon == exonID)
Whereas the current vignette uses the format I used:
formula1 <- ~ sample + exon + type:exon + condition:exon
formula0 <- ~ sample + exon + type:exon
Thanks for your help!
From: mailinglist.honeypot at gmail.com [mailinglist.honeypot at gmail.com] On Behalf Of Steve Lianoglou [lianoglou.steve at gene.com]
Sent: Friday, 17 January 2014 9:33 AM
To: KP [guest]
Cc: bioconductor at r-project.org list; Pillman, Katherine (Health)
Subject: Re: [BioC] DEXSeq - identical p-values for all exons in a gene
You mention running into "the same problem" no matter how the formulas
are encoded, but I do not see anywhere in your email what this problem
Can you please explain and show explicitly what the problem actually is?
Is an error being thrown somewhere, or?
On Thu, Jan 16, 2014 at 2:48 PM, KP [guest] <guest at bioconductor.org> wrote:
> Hi All,
> I have been running DEXSeq successfully on my dataset using the default, in-built linear models, but have been repeatedly running into problems when attempting to add an extra 'blocking' term to the model.
> I have checked the vignette and I thought my usage was correct:
> formulaFullModel <- ~ sample + exon + libType:exon + condition:exon
> formulaReducedModel <- ~ sample + exon + libType:exon
> ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile)
> ecs <- estimateSizeFactors(ecs)
> ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt')
> ecs<- fitDispersionFunction(ecs)
> ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt')
> To check whether my problem was related to the extra term or just manually specifying the model, I ran DEXSeq using only a basic model (no additional terms: formulaFullModel <- ~ sample + exon + condition:exon
> formulaReducedModel <- ~ sample + exon ), and had the same problem. This suggests it must be something to do with how I am specifying the model or providing it to the functions...?
> This is the sample table setup for the basic model:
> X countFile condition
> 1 H1 H.r1.counts control
> 2 M1 M.r1.counts treated
> 3 H2ac H.r2ac.counts control
> 4 M2ac M.r2ac.counts treated
> 5 Hshap H.NA.counts control
> 6 Mshap M.NA.counts treated
> I assume the cause is something silly on my part but I am stumped. Any help would be greatly appreciated!
> P.S. I also tried running the pasilla dataset in the same way but that time it worked, normal p-values.
> -- output of sessionInfo():
> Sorry but this session info might not be quite right - the original run was done on a server and the workspace was saved and opened in RStudio. This is the session info from RStudio.
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>  LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
>  LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C
>  LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
>  parallel stats graphics grDevices utils datasets methods base
> other attached packages:
>  BiocInstaller_1.12.0 DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0
> loaded via a namespace (and not attached):
>  biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6 GenomicRanges_1.14.4 hwriter_1.3
>  IRanges_1.20.6 RCurl_1.95-4.1 Rsamtools_1.14.2 statmod_1.4.18 stats4_3.0.2
>  stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0
> Sent via the guest posting facility at bioconductor.org.
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor