[BioC] DEXSeq questions on power and counting bins
Gu [guest]
guest at bioconductor.org
Tue Jul 2 20:48:34 CEST 2013
Dear All:
I am using DEXSeq for testing differential exon usage (DEU) between two conditions, each having 3 biological replicates. The design matrix contains a covariate called "subject", i.e. the same subject had both the control and treatment. I have three questions:
(1) In the vignette of DEXSeq, in Figure 3 the author compared the adjusted p-values from two tests: without the batch effect (x-axis) and with the batch effect (y-axis). Clearly from the plot, the p-values are smaller when the batch effect is accounted for. However, I don't know if we can conclude from such a plot that "with the type-aware analysis, detection power for DEU due to condition is improved"? It is from a real data analysis, so how do we know the significant genes are really true positives? BTW, in my analysis, after accounting for the subject effect, the number of genes with DEU increases from 14 to 24, and the plot for comparing the p-values are similar to Figure 3. Will properly accounting for covariates in DEXSeq always lead to such a conclusion (i.e. increased detection power)?
(2) I got the HTML outputs using DEXSeqHTML() in R. For each gene with DEU, I can see different plot options: counts, expression, splicing and transcripts. By only looking at the plot with differentially expressed exon(s) in color, it seems that the conclusion is only based on "splicing" as I can see the "distance" between the two conditions, but such distance is very small in other plot options. In the vignette, the author also recommend specifying "splicing=TRUE". Can I know what are the differences among those options, and which one is preferred to use (making more biological sense)?
(3) DEXSeq's inference is based on "counting bins", i.e. not real exons but exonic regions redefined from GTF file. My question is, once I obtain a gene with DEU (ENSG00000056558), how can I know which *real exon* are deferentially used? From Ensembl website, this gene has 3 transcripts (splice variants), but in the DEXSeqHTML output, it has 11 exonic regions, and E010 shows DEU -- how can I tell E010 in this case correponds to which real exon?
Thank you so much for your suggestions!
-- output of sessionInfo():
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US
[4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] DEXSeq_1.6.0 Biobase_2.20.0 BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-5
[4] GenomicRanges_1.12.4 hwriter_1.3 IRanges_1.18.1
[7] RCurl_1.95-4.1 Rsamtools_1.12.3 statmod_1.4.17
[10] stats4_3.0.1 stringr_0.6.2 tools_3.0.1
[13] XML_3.96-1.1 zlibbioc_1.6.0
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list