[BioC] DEXSeq - identical p-values for all exons in a gene

Pillman, Katherine (Health) Katherine.Pillman at health.sa.gov.au
Sun Jan 19 23:34:09 CET 2014


Hi Simon,

  Thanks for your response to my message.  I'm sorry but I don't understand what you mean (perhaps this is the root of my problem!). You wrote that I did not specify the full model formula to estimateDispersions.  I am confused because I thought that was what I was doing when I wrote "formula=formulaFullModel" in the line:

>> ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt')

Can you please explain further what I have done incorrectly?  Do I have the syntax wrong or something?  Or maybe you mean the whole model (i.e. both formulas, somehow)?

Is it correct that my usage for testForDEU() was correct, but that my problem lies in my call of estimateDispersions()?

Also, this is my sample table:
         X                                              countFile   condition type end
1       H1       H.r1.RNA-seq_mRNA.tophat2_pe.hg19.nsorted.counts     control plus  pe
2       M1       M.r1.RNA-seq_mRNA.tophat2_pe.hg19.nsorted.counts     treated plus  pe
3     H2ac     H.r2ac.RNA-seq_mRNA.tophat2_pe.hg19.nsorted.counts     control  all  pe
4     M2ac     M.r2ac.RNA-seq_mRNA.tophat2_pe.hg19.nsorted.counts     treated  all  pe
5       Hs       H.NA.RNA-seq_mRNA.tophat2_se.hg19.nsorted.counts     control plus  se
6       Ms       M.NA.RNA-seq_mRNA.tophat2_se.hg19.nsorted.counts     treated plus  se

Thanks again for your help!

Cheers, Kat

(P.S. Thanks so much for your work on DEXSeq and HTSeq, they are both fantastic packages which has saved us a countless amount of time and energy!!)

________________________________________


Hi

In your code

>> ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile)
>> ecs <- estimateSizeFactors(ecs)
>> ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt')
>> ecs<- fitDispersionFunction(ecs)
>>
>> ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt')

you specify the full model formula in 'testForDEU' but do not in
'estimateDispersions'.

You have noticed that we changed the way how formulae should be
specified. Before, you needed three formulae, now only two, because we
got rid of the need for a special "dispersion formula". Rather, you use
the full model formula for dispersion estimation, too.

However, you have to pass the formula to _both_ functions.

  Simon

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




From: mailinglist.honeypot at gmail.com [mailinglist.honeypot at gmail.com] On Behalf Of Steve Lianoglou [lianoglou.steve at gene.com]
Sent: Friday, 17 January 2014 9:33 AM
To: KP [guest]
Cc: bioconductor at r-project.org list; Pillman, Katherine (Health)
Subject: Re: [BioC] DEXSeq - identical p-values for all exons in a gene

Hi Katherine,

You mention running into "the same problem" no matter how the formulas
are encoded, but I do not see anywhere in your email what this problem
actually is.

Can you please explain and show explicitly what the problem actually is?

Is an error being thrown somewhere, or?

Thanks,
-steve

On Thu, Jan 16, 2014 at 2:48 PM, KP [guest] <guest at bioconductor.org> wrote:
>
> Hi All,
>
> I have been running DEXSeq successfully on my dataset using the default, in-built linear models, but have been repeatedly running into problems when attempting to add an extra 'blocking' term to the model.
>
> I have checked the vignette and I thought my usage was correct:
>
> formulaFullModel <- ~ sample + exon + libType:exon + condition:exon
> formulaReducedModel <- ~ sample + exon + libType:exon
>
> ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile)
> ecs <- estimateSizeFactors(ecs)
> ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt')
> ecs<- fitDispersionFunction(ecs)
>
> ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt')
>
> To check whether my problem was related to the extra term or just manually specifying the model, I ran DEXSeq using only a basic model (no additional terms: formulaFullModel <- ~ sample + exon + condition:exon
> formulaReducedModel <- ~ sample + exon ), and had the same problem. This suggests it must be something to do with how I am specifying the model or providing it to the functions...?
>
> This is the sample table setup for the basic model:
>
>       X     countFile condition
> 1    H1   H.r1.counts   control
> 2    M1   M.r1.counts   treated
> 3  H2ac H.r2ac.counts   control
> 4  M2ac M.r2ac.counts   treated
> 5 Hshap   H.NA.counts   control
> 6 Mshap   M.NA.counts   treated
>
>
> I assume the cause is something silly on my part but I am stumped.  Any help would be greatly appreciated!
>
> Thanks!
>
> P.S. I also tried running the pasilla dataset in the same way but that time it worked, normal p-values.
>
>  -- output of sessionInfo():
>
> Sorry but this session info might not be quite right - the original run was done on a server and the workspace was saved and opened in RStudio. This is the session info from RStudio.
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8    LC_PAPER=en_AU.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] BiocInstaller_1.12.0 DEXSeq_1.8.0         Biobase_2.22.0       BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
>  [1] biomaRt_2.18.0       Biostrings_2.30.1    bitops_1.0-6         GenomicRanges_1.14.4 hwriter_1.3
>  [6] IRanges_1.20.6       RCurl_1.95-4.1       Rsamtools_1.14.2     statmod_1.4.18       stats4_3.0.2
> [11] stringr_0.6.2        tools_3.0.2          XML_3.98-1.1         XVector_0.2.0        zlibbioc_1.8.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



--
Steve Lianoglou
Computational Biologist
Genentech


More information about the Bioconductor mailing list