[BioC] DEXSeq many exons marked as not testable

Wed Sep 19 18:20:50 CEST 2012

Dear Stefan Dentro,

Thanks for your email.  It actually looks strange, could you send the 
output of the function "counts", and subset for this specific gene? 
Could you also do it for the "fData" function??

Alejandro

> Hello,
>
> I'm using DEXSeq to obtain differentially expressed exons between my 15
> samples and 8 controls. But for some reason most exons are not testable and
> are not assigned a p-value. Do you know what I'm doing wrong?
>
> First I've created a conservative list of exons per gene in the human
> genome through the canonical transcripts table in UCSC (194511 in total).
> Next I've obtained read counts for each exon in each sample which are read
> into one big data.frame. Then the following list of commands are executed:
>
> metadata = data.frame(
>    row.names=colnames(dat)[7:29],
>    condition=c(rep("control", 8), rep("data", 15)),
>    replicate=c(c(1:8), c(1:15)),
>    libType=rep("paired-end", 23))
>
> ## Add strand information that is not available in the read counts
> data.frame
> dat = cbind(dat, strand=rep(NA, nrow(dat)))
>
> cds = newExonCountSet(countData = dat[,c(7:29)], design = metadata,
>                         geneIDs = dat[,5], exonIDs = dat[,6],
>                         exonIntervals=dat[,c(2,3,4, 30)])
>
> cds = estimateSizeFactors(cds)
> cds = estimateDispersions(cds, minCount=2, maxExon=90)
> cds = fitDispersionFunction(cds)
> cds = testForDEU(cds)
> cds = estimatelog2FoldChanges(cds)
>
> Now lets see how many exons are expressed significantly different between
> data and controls. Surprisingly only 1850 exons are described here, not the
> full 194511:
>
> res1 = DEUresultTable(cds)
> table(res1$padjust < 0.05)
>
> FALSE  TRUE
>   1424   426
>
> So I've zoomed in to an example gene. Below it can be seen that indeed all
> exons are marked as not testable. But looking at the individual read counts
> per sample (see further below) it does not make sense that they are not
> testable. Some exons indeed have a low read count and are excluded
> rightfully, but others have enough reads to base dispersion on I would
> think.
>
> If I have understood correctly testable can be false in either of these
> cases:
> - Gene has only one exon, dispersion cannot be calculated.
> - Readcounts for an exon are too low.
> - A combination of the above.
>
> But all do not apply here. Any ideas?
>
>
> fData for the gene:
>                  geneID exonID testable dispBeforeSharing dispFitted
> dispersion pvalue padjust chr    start      end strand transcripts
> 136687 ENSG00000009780      1    FALSE                NA  0.2026585
> 0.2026585     NA      NA   1 28052568 28052672   <NA> <NA>
> 108973 ENSG00000009780      2    FALSE                NA  0.9549670
> 0.9549670     NA      NA   1 28053983 28054047   <NA> <NA>
> 1100   ENSG00000009780      3    FALSE                NA  1.9560912
> 1.9560912     NA      NA   1 28056743 28056844   <NA> <NA>
> 1      ENSG00000009780      4    FALSE                NA  0.4722995
> 0.4722995     NA      NA   1 28059114 28059168   <NA> <NA>
> 22096  ENSG00000009780      5    FALSE                NA  0.1146813
> 0.1146813     NA      NA   1 28060542 28060694   <NA> <NA>
> 13670  ENSG00000009780      6    FALSE                NA  0.1127563
> 0.1127563     NA      NA   1 28071165 28071322   <NA> <NA>
> 13666  ENSG00000009780      7    FALSE                NA  0.1450013
> 0.1450013     NA      NA   1 28075579 28075665   <NA> <NA>
> 22095  ENSG00000009780      8    FALSE                NA  0.1160550
> 0.1160550     NA      NA   1 28081706 28081841   <NA> <NA>
> 13665  ENSG00000009780      9    FALSE                NA  0.1129432
> 0.1129432     NA      NA   1 28086037 28086138   <NA> <NA>
> 138629 ENSG00000009780     10    FALSE                NA  0.1112855
> 0.1112855     NA      NA   1 28087006 28087092   <NA> <NA>
>
> Read counts for the gene per sample:
>                 exon_id chr    start      end         gene_id exon_nr
> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
> 1      ENSE00000327880   1 28059114 28059168 ENSG00000009780 4
> 11       8       4       7       3       1       8       3
> 1100   ENSE00000571221   1 28056743 28056844 ENSG00000009780 3
> 1       0       1       1       0       0       0       1
> 13665  ENSE00000761751   1 28086037 28086138 ENSG00000009780 9      91
> 145      86      47     169      36     231     136
> 13666  ENSE00000761753   1 28075579 28075665 ENSG00000009780 7      19
> 62      32      18      44      23      61      38
> 13670  ENSE00000761780   1 28071165 28071322 ENSG00000009780 6      77
> 139      73      66      78      30     168      88
> 22095  ENSE00000866714   1 28081706 28081841 ENSG00000009780 8      78
> 94      52      47      40      41     121      60
> 22096  ENSE00000866716   1 28060542 28060694 ENSG00000009780 5      36
> 61      50      63      67      10     129      69
> 108973 ENSE00001625313   1 28053983 28054047 ENSG00000009780 2
> 3       5       3       0       1       0       4       2
> 136687 ENSE00001909353   1 28052568 28052672 ENSG00000009780 1       2
> 12       5       0      19       0      34      10
> 138629 ENSE00001955917   1 28087006 28087092 ENSG00000009780 10
> 42      96      50      24      98      23     117      79
>         sample9 sample10 sample11 sample12 sample13 sample14 sample15
> sample16 sample17 sample18 sample19 sample20 sample21 sample22
> 1            1        1        0        3        4        1 1
> 0        3        7        0        0        1        0
> 1100         2        0        0        0        0        1 1
> 0        0        3        1        0        0        0
> 13665       78       64      130       55       66       27 75
> 53       49      126       36       37       48       53
> 13666        7       20       51        6       43        8 15
> 19       11       26        7       15       11        5
> 13670       75       85      186       79       86       58 70
> 86       47      149       82       49       66       38
> 22095       64       64      149       51       91       37 67
> 54       58      148       54       41       64       43
> 22096       90       79      287       66       97       29 75
> 92       58      126       67       55       71       51
> 108973       1        0        0        1        1        0 1
> 1        0        4        0        0        0        0
> 136687       5       28       31        6       27        5 4
> 12        5       32        6        7        9       16
> 138629     119      111      233      100      129       59 141
> 103       70      219       92       60       74       67
>         sample23 strand
> 1             0     NA
> 1100          0     NA
> 13665        24     NA
> 13666         7     NA
> 13670        27     NA
> 22095        25     NA
> 22096        21     NA
> 108973        1     NA
> 136687        1     NA
> 138629       54     NA
>
> Design of the experiment:
>           condition replicate    libType
> sample1    control         1 paired-end
> sample2    control         2 paired-end
> sample3    control         3 paired-end
> sample4    control         4 paired-end
> sample5    control         5 paired-end
> sample6    control         6 paired-end
> sample7    control         7 paired-end
> sample8    control         8 paired-end
> sample9       data         1 paired-end
> sample10      data         2 paired-end
> sample11      data         3 paired-end
> sample12      data         4 paired-end
> sample13      data         5 paired-end
> sample14      data         6 paired-end
> sample15      data         7 paired-end
> sample16      data         8 paired-end
> sample17      data         9 paired-end
> sample18      data        10 paired-end
> sample19      data        11 paired-end
> sample20      data        12 paired-end
> sample21      data        13 paired-end
> sample22      data        14 paired-end
> sample23      data        15 paired-end
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C LC_TIME=en_GB.UTF-8
> LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
>   [6] LC_MESSAGES=C              LC_PAPER=C LC_NAME=C
> LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods base
>
> other attached packages:
> [1] plyr_1.7.1          DEXSeq_1.2.1        BiocInstaller_1.4.3
> DESeq_1.8.3         locfit_1.5-7        Biobase_2.16.0
> [7] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
>   [1] annotate_1.34.0      AnnotationDbi_1.18.0 biomaRt_2.12.0
> DBI_0.2-5            genefilter_1.38.0    geneplotter_1.34.0
>   [7] grid_2.15.0          hwriter_1.3          IRanges_1.14.2
> lattice_0.20-6       RColorBrewer_1.0-5   RCurl_1.91-1
> [13] RSQLite_0.11.1       splines_2.15.0       statmod_1.4.15
> stats4_2.15.0        stringr_0.6          survival_2.36-12
> [19] tools_2.15.0         XML_3.9-4            xtable_1.7-0
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor