[BioC] mock comparison p-value histogram in DEXSeq
Alejandro Reyes
alejandro.reyes at embl.de
Fri Jul 12 11:15:52 CEST 2013
Dear Gu,
Thanks for pointing this out!
You are right at saying this and we have observed the same when doing
mock comparisons with DEXSeq. The skewed distribution of p-values
towards 1 is caused by the dispersion values used when testing: since we
take the maximum between the fitted value and the per-exon estimates,
our test becomes conservative (therefore the skewed distribution). You
will see that if one uses only the per-exon estimate, the p-values are
not skewed, but one will call some annoying outliers.
Admittedly, this maximum rule is not the most elegant solution, but it
is a temporary idea to get rid of outliers. Probably in the future we
will integrate new approached such as the DESEq2 bayesian shrinkage in
order to avoid doing this.
Best regards,
Alejandro
> Dear All:
>
> In the DEXSeq paper, the authors compared DEXSeq with Cuffdiff in terms of controlling Type-I error rates. From the mock comparison results (control vs. control), we can see DEXSeq reported far fewer genes with differential exon usage (DEU), as shown in Table S2 of the DEXSeq paper (2012). However, I think this kind of mock comparison is "under the null", which means if we plot a histogram of the p-values from such comparison, it should be very close to the histogram from a uniform random variable. I am not sure if the authors from DEXSeq have checked that, or consider it inappropriate.
>
> I use my dataset to make a control vs. control comparison, and happily find very few genes with DEU (which is good). However, when I plot the raw p-values (not the B-H adjusted p-values), the resultant histogram is not uniform-like. The height of each histogram bin is increasing monotonically, i.e. the frequencies increase as the p-values increase. In other words, there are "so few" small p-values reported for the control vs. control comparison. What can I tell from such a histogram?
>
> The reason why I ask this question is that, even though the number of reported genes with DEU is small using DEXSeq for mock comparison, the p-values, in my thinking, should be uniform-like. I can convince myself with the small numbers, but would be more convinced from the histogram.
>
> Thank you for your suggestions. Please correct me if I am wrong!
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US
> [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US
> [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] DEXSeq_1.6.0 Biobase_2.20.0 BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-5
> [4] GenomicRanges_1.12.4 hwriter_1.3 IRanges_1.18.1
> [7] RCurl_1.95-4.1 Rsamtools_1.12.3 statmod_1.4.17
> [10] stats4_3.0.1 stringr_0.6.2 tools_3.0.1
> [13] XML_3.96-1.1 zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
More information about the Bioconductor
mailing list