[Bioc-sig-seq] problem when using DEXSeq on genes with 1 exon only

Wolfgang Huber whuber at embl.de
Mon Aug 8 14:45:41 CEST 2011


Dear Jane,

in the case of genes (or better term: loci) with only one exon, DEXSeq's 
testGeneForDEU does not make sense - it is looking for exons within the 
locus that behave differently from the average of all the others: there 
are no others! Hence the parameters are unidentifiable, and the software 
should not even look at them. We'll robustify the software for this case 
in future versions (note that the package is still actively being 
developed).

The case of the NA values that are not explained by there only being one 
exon is more interesting. Can you send us the code to reproduce your 
problem (this also requires the data, which you can send offline, and in 
which you can for instance anonymize the gene and sample annotations).

	Best wishes
	Wolfgang

Jul/28/11 4:23 PM, Jane Merlevede scripsit::
> Hello,
>
> I am using DEXseq for exon differential analysis. The analysis seems to work
> well on genes with several exons, but I have a problem with genes with only
> one exon.
> Let me present you my dataset:
>
> pData(ecs)
>          sizeFactor condition replicate       type
> Raman_1  0.9638822    nonvir         1 paired-end
> HM1_1    1.4000715       vir         1 paired-end
> Raman_2  1.0314049    nonvir         2 paired-end
> Raman_3  1.0734133    nonvir         3 paired-end
> HM1_2    0.7801269       vir         2 paired-end
> HM1_3    0.9488409       vir         3 paired-end
>
>
> table( table (geneIDs(ecs)))
>     1    2    3    4    5    6    7    8
> 3149  893  874  206  100   42   18    4
>
> As you can see, I have 3149 genes (out of 5286) which have only 1 exon.
> There are 9291 exons.
>
> res1<- DEUresultTable(ecs)
>
>> summary(res1)
>                       geneID         exonID     dispersion_CR_est
>   EHI_051420             :   8   E001   :5286   Min.   :0.000e+00
>   EHI_070690             :   8   E002   :2137   1st Qu.:7.176e-03
>   EHI_169670             :   8   E003   :1244   Median :1.693e-02
>   EHI_175010+EHI_175000.1:   8   E004   : 370   Mean   :5.546e-02
>   EHI_005010             :   7   E005   : 164   3rd Qu.:3.641e-02
>   EHI_042710             :   7   E006   :  64   Max.   :1.300e+01
>   (Other)                :9245   (Other):  26   NA's   :3.153e+03
>     dispersion            pvalue             padjust
>   Min.   :2.010e-02   Min.   :   0.0000   Min.   :   0.0000
>   1st Qu.:2.202e-02   1st Qu.:   0.1150   1st Qu.:   0.4597
>   Median :2.628e-02   Median :   0.4054   Median :   0.8086
>   Mean   :3.229e+04   Mean   :   0.4269   Mean   :   0.6775
>   3rd Qu.:4.214e-02   3rd Qu.:   0.7133   3rd Qu.:   0.9510
>   Max.   :1.000e+08   Max.   :   1.0000   Max.   :   1.0000
>                       NA's   :3153.0000   NA's   :3153.0000
>
> There are 3153 NA's values. When checking out what were those NA's values, I
> realized that they were all the genes with only one exon (and only 4 more
> genes in the other cases). I doubt that it is by chance... And they don't
> have weird read counts. Check for example EHI_000010, EHI_000410,
> EHI_000420, EHI_000430, EHI_000440:
>
> geneCountTable(ecs)[1:21,]
>               Raman_1 HM1_1 Raman_2 Raman_3 HM1_2 HM1_3
> EHI_000010       391  1364     464     449   457   560
> EHI_000130     18806  8116   21890   24994  6509 14067
> EHI_000240     24057 21825   15365   26903 21101 27841
> EHI_000250      3374  2689    3938    4854  2310  3556
> EHI_000290       458  3721     437     550  1716  1054
> EHI_000410       896  1041    1086     978   905   745
> EHI_000420       140   220     244     117   140    78
> EHI_000430       583  2043     304     634  1398  1197
> EHI_000440      3103 15309    5358    3619  9152 10392
> EHI_000450       358   559     991     516   281   564
> EHI_000460.1     357   118     449     425    43   746
> EHI_000470.1     408  2657     297     397   946   909
> EHI_000480       842  1702     858     778   841   847
> EHI_000490       187   648     221     239   370   249
> EHI_000500       329   506     454     402   244   555
> EHI_000520       482  1615     883     440   925   751
> EHI_000530      1794  1588    1751    1966  1099  1860
> EHI_000540      2857  2504    3706    4152  1479  2547
> EHI_000550      1967  3999    1241    1904  2505  2448
> EHI_000560       152   434     239     165   273   175
> EHI_000570      1209  2142    2183    1213  1905  2662
>
>
> res1[1:20,]
>                         geneID exonID dispersion_CR_est dispersion     pvalue
> EHI_000010:001     EHI_000010   E001                NA 0.02377712         NA
> EHI_000130:001     EHI_000130   E001       0.074669451 0.07466945 0.48673539
> EHI_000130:002     EHI_000130   E002       0.128972813 0.12897281 0.94279527
> EHI_000130:003     EHI_000130   E003       0.019602219 0.02031871 0.84327914
> EHI_000130:004     EHI_000130   E004       0.067337196 0.06733720 0.70429765
> EHI_000240:001     EHI_000240   E001       0.043753997 0.04375400 0.74939865
> EHI_000240:002     EHI_000240   E002       0.058562700 0.05856270 0.74939865
> EHI_000250:001     EHI_000250   E001       0.031440630 0.03144063 0.70444841
> EHI_000250:002     EHI_000250   E002       0.037457450 0.03745745 0.70444841
> EHI_000290:001     EHI_000290   E001       0.021113888 0.02396819 0.05100089
> EHI_000290:002     EHI_000290   E002       0.007626346 0.03471905 0.95676220
> EHI_000290:003     EHI_000290   E003       0.036889359 0.03688936 0.03057067
> EHI_000410:001     EHI_000410   E001                NA 0.02235347         NA
> EHI_000420:001     EHI_000420   E001                NA 0.03395573         NA
> EHI_000430:001     EHI_000430   E001                NA 0.02219522         NA
> EHI_000440:001     EHI_000440   E001                NA 0.02037263         NA
> EHI_000450:001     EHI_000450   E001       0.024495906 0.04492663 0.64717121
> EHI_000450:002     EHI_000450   E002       0.020356260 0.02483641 0.64717121
> EHI_000460.1:001 EHI_000460.1   E001                NA 0.02602179         NA
> EHI_000470.1:001 EHI_000470.1   E001                NA 0.02254333         NA
>                     padjust
> EHI_000010:001          NA
> EHI_000130:001   0.8639624
> EHI_000130:002   0.9927663
> EHI_000130:003   0.9756197
> EHI_000130:004   0.9503086
> EHI_000240:001   0.9611397
> EHI_000240:002   0.9611397
> EHI_000250:001   0.9503086
> EHI_000250:002   0.9503086
> EHI_000290:001   0.2890521
> EHI_000290:002   0.9962043
> EHI_000290:003   0.2087239
> EHI_000410:001          NA
> EHI_000420:001          NA
> EHI_000430:001          NA
> EHI_000440:001          NA
> EHI_000450:001   0.9278993
> EHI_000450:002   0.9278993
> EHI_000460.1:001        NA
> EHI_000470.1:001        NA
>
> I checked the model in the documentation and I don't see why the test does
> not give a result in this case. Has anyone get the same problem?
>
> Thanks for your help,
> Jane Merlevède
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioc-sig-sequencing mailing list