[BioC] mock comparison p-value histogram in DEXSeq

Fri Jul 12 11:15:52 CEST 2013

Dear Gu,

Thanks for pointing this out!

You are right at saying this and we have observed the same when doing 
mock comparisons with DEXSeq. The skewed distribution of p-values 
towards 1 is caused by the dispersion values used when testing: since we 
take the maximum between the fitted value and the per-exon estimates, 
our test becomes conservative (therefore the skewed distribution). You 
will see that if one uses only the per-exon estimate, the p-values are 
not skewed, but one will call some annoying outliers.

Admittedly, this maximum rule is not the most elegant solution, but it 
is a temporary idea to get rid of outliers. Probably in the future we 
will integrate new approached such as the DESEq2 bayesian shrinkage in 
order to avoid doing this.

Best regards,
Alejandro

> Dear All:
>
> In the DEXSeq paper, the authors compared DEXSeq with Cuffdiff in terms of controlling Type-I error rates. From the mock comparison results (control vs. control), we can see DEXSeq reported far fewer genes with differential exon usage (DEU), as shown in Table S2 of the DEXSeq paper (2012). However, I think this kind of mock comparison is "under the null", which means if we plot a histogram of the p-values from such comparison, it should be very close to the histogram from a uniform random variable. I am not sure if the authors from DEXSeq have checked that, or consider it inappropriate.
>
> I use my dataset to make a control vs. control comparison, and happily find very few genes with DEU (which is good). However, when I plot the raw p-values (not the B-H adjusted p-values), the resultant histogram is not uniform-like. The height of each histogram bin is increasing monotonically, i.e. the frequencies increase as the p-values increase. In other words, there are "so few" small p-values reported for the control vs. control comparison. What can I tell from such a histogram?
>
> The reason why I ask this question is that, even though the number of reported genes with DEU is small using DEXSeq for mock comparison, the p-values, in my thinking, should be uniform-like. I can convince myself with the small numbers, but would be more convinced from the histogram.
>
> Thank you for your suggestions. Please correct me if I am wrong!
>
>   -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
>   [4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
>   [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C
> [10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] DEXSeq_1.6.0       Biobase_2.20.0     BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
>   [1] biomaRt_2.16.0       Biostrings_2.28.0    bitops_1.0-5
>   [4] GenomicRanges_1.12.4 hwriter_1.3          IRanges_1.18.1
>   [7] RCurl_1.95-4.1       Rsamtools_1.12.3     statmod_1.4.17
> [10] stats4_3.0.1         stringr_0.6.2        tools_3.0.1
> [13] XML_3.96-1.1         zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>