Hi Alejandro, I just reran the code with only ENSG00000009780 in the dataset. The odd thing is that this time it worked perfectly. Regards, Stefan ## Only ENSG00000009780 > counts(cds) sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 1 11 8 4 7 3 1 8 3 1 1 0 3 4 1 1100 1 0 1 1 0 0 0 1 2 0 0 0 0 1 13665 91 145 86 47 169 36 231 136 78 64 130 55 66 27 13666 19 62 32 18 44 23 61 38 7 20 51 6 43 8 13670 77 139 73 66 78 30 168 88 75 85 186 79 86 58 22095 78 94 52 47 40 41 121 60 64 64 149 51 91 37 22096 36 61 50 63 67 10 129 69 90 79 287 66 97 29 108973 3 5 3 0 1 0 4 2 1 0 0 1 1 0 136687 2 12 5 0 19 0 34 10 5 28 31 6 27 5 138629 42 96 50 24 98 23 117 79 119 111 233 100 129 59 sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 sample23 1 1 0 3 7 0 0 1 0 0 1100 1 0 0 3 1 0 0 0 0 13665 75 53 49 126 36 37 48 53 24 13666 15 19 11 26 7 15 11 5 7 13670 70 86 47 149 82 49 66 38 27 22095 67 54 58 148 54 41 64 43 25 22096 75 92 58 126 67 55 71 51 21 108973 1 1 0 4 0 0 0 0 1 136687 4 12 5 32 6 7 9 16 1 138629 141 103 70 219 92 60 74 67 54 > fData(cds) geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end 1 ENSG00000009780 4 TRUE 0.40135710 0.40580351 0.40580351 4.256220e-04 8.512441e-04 1 28059114 28059168 1100 ENSG00000009780 3 TRUE 0.57127358 1.98755125 1.98755125 9.715722e-01 9.715722e-01 1 28056743 28056844 13665 ENSG00000009780 9 TRUE 0.01991574 0.03806432 0.03806432 2.537419e-09 1.268709e-08 1 28086037 28086138 13666 ENSG00000009780 7 TRUE 0.09257216 0.07072531 0.09257216 6.740254e-07 2.246751e-06 1 28075579 28075665 13670 ENSG00000009780 6 TRUE 0.01517262 0.03752086 0.03752086 7.138709e-01 7.931899e-01 1 28071165 28071322 22095 ENSG00000009780 8 TRUE 0.03299040 0.04028325 0.04028325 5.589838e-01 6.987297e-01 1 28081706 28081841 22096 ENSG00000009780 5 TRUE 0.04659993 0.03930375 0.04659993 1.632722e-04 4.081804e-04 1 28060542 28060694 108973 ENSG00000009780 2 TRUE 0.01382188 0.99934256 0.99934256 6.159293e-02 1.026549e-01 1 28053983 28054047 136687 ENSG00000009780 1 TRUE 0.38138204 0.12609379 0.38138204 1.302911e-01 1.861302e-01 1 28052568 28052672 138629 ENSG00000009780 10 TRUE 0.01782266 0.03578012 0.03578012 7.407408e-12 7.407408e-11 1 28087006 28087092 strand transcripts log2fold(data/control) 1 -1.89528866 1100 -0.01821944 13665 -0.79494559 13666 -1.13738728 13670 -0.06639675 22095 0.07349758 22096 0.52608193 108973 -1.59373125 136687 0.56880558 138629 0.85158758 On Wed, Sep 19, 2012 at 5:20 PM, Alejandro Reyes wrote: > Dear Stefan Dentro, > > Thanks for your email. It actually looks strange, could you send the > output of the function "counts", and subset for this specific gene? Could > you also do it for the "fData" function?? > > Alejandro > > > Hello, >> >> I'm using DEXSeq to obtain differentially expressed exons between my 15 >> samples and 8 controls. But for some reason most exons are not testable >> and >> are not assigned a p-value. Do you know what I'm doing wrong? >> >> First I've created a conservative list of exons per gene in the human >> genome through the canonical transcripts table in UCSC (194511 in total). >> Next I've obtained read counts for each exon in each sample which are read >> into one big data.frame. Then the following list of commands are executed: >> >> metadata = data.frame( >> row.names=colnames(dat)[7:29], >> condition=c(rep("control", 8), rep("data", 15)), >> replicate=c(c(1:8), c(1:15)), >> libType=rep("paired-end", 23)) >> >> ## Add strand information that is not available in the read counts >> data.frame >> dat = cbind(dat, strand=rep(NA, nrow(dat))) >> >> cds = newExonCountSet(countData = dat[,c(7:29)], design = metadata, >> geneIDs = dat[,5], exonIDs = dat[,6], >> exonIntervals=dat[,c(2,3,4, 30)]) >> >> cds = estimateSizeFactors(cds) >> cds = estimateDispersions(cds, minCount=2, maxExon=90) >> cds = fitDispersionFunction(cds) >> cds = testForDEU(cds) >> cds = estimatelog2FoldChanges(cds) >> >> Now lets see how many exons are expressed significantly different between >> data and controls. Surprisingly only 1850 exons are described here, not >> the >> full 194511: >> >> res1 = DEUresultTable(cds) >> table(res1$padjust < 0.05) >> >> FALSE TRUE >> 1424 426 >> >> So I've zoomed in to an example gene. Below it can be seen that indeed all >> exons are marked as not testable. But looking at the individual read >> counts >> per sample (see further below) it does not make sense that they are not >> testable. Some exons indeed have a low read count and are excluded >> rightfully, but others have enough reads to base dispersion on I would >> think. >> >> If I have understood correctly testable can be false in either of these >> cases: >> - Gene has only one exon, dispersion cannot be calculated. >> - Readcounts for an exon are too low. >> - A combination of the above. >> >> But all do not apply here. Any ideas? >> >> >> fData for the gene: >> geneID exonID testable dispBeforeSharing dispFitted >> dispersion pvalue padjust chr start end strand transcripts >> 136687 ENSG00000009780 1 FALSE NA 0.2026585 >> 0.2026585 NA NA 1 28052568 28052672 >> 108973 ENSG00000009780 2 FALSE NA 0.9549670 >> 0.9549670 NA NA 1 28053983 28054047 >> 1100 ENSG00000009780 3 FALSE NA 1.9560912 >> 1.9560912 NA NA 1 28056743 28056844 >> 1 ENSG00000009780 4 FALSE NA 0.4722995 >> 0.4722995 NA NA 1 28059114 28059168 >> 22096 ENSG00000009780 5 FALSE NA 0.1146813 >> 0.1146813 NA NA 1 28060542 28060694 >> 13670 ENSG00000009780 6 FALSE NA 0.1127563 >> 0.1127563 NA NA 1 28071165 28071322 >> 13666 ENSG00000009780 7 FALSE NA 0.1450013 >> 0.1450013 NA NA 1 28075579 28075665 >> 22095 ENSG00000009780 8 FALSE NA 0.1160550 >> 0.1160550 NA NA 1 28081706 28081841 >> 13665 ENSG00000009780 9 FALSE NA 0.1129432 >> 0.1129432 NA NA 1 28086037 28086138 >> 138629 ENSG00000009780 10 FALSE NA 0.1112855 >> 0.1112855 NA NA 1 28087006 28087092 >> >> Read counts for the gene per sample: >> exon_id chr start end gene_id exon_nr >> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 >> 1 ENSE00000327880 1 28059114 28059168 ENSG00000009780 4 >> 11 8 4 7 3 1 8 3 >> 1100 ENSE00000571221 1 28056743 28056844 ENSG00000009780 3 >> 1 0 1 1 0 0 0 1 >> 13665 ENSE00000761751 1 28086037 28086138 ENSG00000009780 9 91 >> 145 86 47 169 36 231 136 >> 13666 ENSE00000761753 1 28075579 28075665 ENSG00000009780 7 19 >> 62 32 18 44 23 61 38 >> 13670 ENSE00000761780 1 28071165 28071322 ENSG00000009780 6 77 >> 139 73 66 78 30 168 88 >> 22095 ENSE00000866714 1 28081706 28081841 ENSG00000009780 8 78 >> 94 52 47 40 41 121 60 >> 22096 ENSE00000866716 1 28060542 28060694 ENSG00000009780 5 36 >> 61 50 63 67 10 129 69 >> 108973 ENSE00001625313 1 28053983 28054047 ENSG00000009780 2 >> 3 5 3 0 1 0 4 2 >> 136687 ENSE00001909353 1 28052568 28052672 ENSG00000009780 1 2 >> 12 5 0 19 0 34 10 >> 138629 ENSE00001955917 1 28087006 28087092 ENSG00000009780 10 >> 42 96 50 24 98 23 117 79 >> sample9 sample10 sample11 sample12 sample13 sample14 sample15 >> sample16 sample17 sample18 sample19 sample20 sample21 sample22 >> 1 1 1 0 3 4 1 1 >> 0 3 7 0 0 1 0 >> 1100 2 0 0 0 0 1 1 >> 0 0 3 1 0 0 0 >> 13665 78 64 130 55 66 27 75 >> 53 49 126 36 37 48 53 >> 13666 7 20 51 6 43 8 15 >> 19 11 26 7 15 11 5 >> 13670 75 85 186 79 86 58 70 >> 86 47 149 82 49 66 38 >> 22095 64 64 149 51 91 37 67 >> 54 58 148 54 41 64 43 >> 22096 90 79 287 66 97 29 75 >> 92 58 126 67 55 71 51 >> 108973 1 0 0 1 1 0 1 >> 1 0 4 0 0 0 0 >> 136687 5 28 31 6 27 5 4 >> 12 5 32 6 7 9 16 >> 138629 119 111 233 100 129 59 141 >> 103 70 219 92 60 74 67 >> sample23 strand >> 1 0 NA >> 1100 0 NA >> 13665 24 NA >> 13666 7 NA >> 13670 27 NA >> 22095 25 NA >> 22096 21 NA >> 108973 1 NA >> 136687 1 NA >> 138629 54 NA >> >> Design of the experiment: >> condition replicate libType >> sample1 control 1 paired-end >> sample2 control 2 paired-end >> sample3 control 3 paired-end >> sample4 control 4 paired-end >> sample5 control 5 paired-end >> sample6 control 6 paired-end >> sample7 control 7 paired-end >> sample8 control 8 paired-end >> sample9 data 1 paired-end >> sample10 data 2 paired-end >> sample11 data 3 paired-end >> sample12 data 4 paired-end >> sample13 data 5 paired-end >> sample14 data 6 paired-end >> sample15 data 7 paired-end >> sample16 data 8 paired-end >> sample17 data 9 paired-end >> sample18 data 10 paired-end >> sample19 data 11 paired-end >> sample20 data 12 paired-end >> sample21 data 13 paired-end >> sample22 data 14 paired-end >> sample23 data 15 paired-end >> >> sessionInfo() >>> >> R version 2.15.0 (2012-03-30) >> Platform: i686-pc-linux-gnu (32-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 >> LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 >> [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C >> LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] plyr_1.7.1 DEXSeq_1.2.1 BiocInstaller_1.4.3 >> DESeq_1.8.3 locfit_1.5-7 Biobase_2.16.0 >> [7] BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0 >> DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 >> [7] grid_2.15.0 hwriter_1.3 IRanges_1.14.2 >> lattice_0.20-6 RColorBrewer_1.0-5 RCurl_1.91-1 >> [13] RSQLite_0.11.1 splines_2.15.0 statmod_1.4.15 >> stats4_2.15.0 stringr_0.6 survival_2.36-12 >> [19] tools_2.15.0 XML_3.9-4 xtable_1.7-0 >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor >> > > [[alternative HTML version deleted]]