[Bioc-devel] Possible bug in association of gene annotation to topTags in edgeR package

rcaloger raffaele.calogero at gmail.com
Wed Jun 9 20:36:17 CEST 2010


Hi,
I have implemented edgeR in oneChannelGUI to detect differential 
expression in NGS data.
I observed some inconsistency in the topTags output, since the 
annotation does not correctly align with the subset of the selected 
differentially expressed genes.
It seems to me that the point is the lack of alignment between the gene 
annotation and the counts in the deDGEList object produced by exactTest, 
when the exactTest removes some of the rows that have zero counts:
 > sm.n <- exactTest(ecd, c("sm", "n"))
Comparison of groups:  n - sm
Warning message:
In DGEList(counts = object$counts[, this.pair], group = group.pair,  :
   Removing42rows that all have zero counts.
 > sm.n
An object of class "deDGEList"
$table
                 logConc  logFC p.value
ENSG00000199386   -20.6  0.722   0.338
ENSG00000199180   -14.0  0.353   0.551
ENSG00000199295   -21.4  0.632   0.418
ENSG00000199095   -23.2 -0.432   0.793
ENSG00000198976   -19.4  0.474   0.516
4654 more rows ...

$comparison
[1] "sm" "n"

$genes
                 chr strand    start      end feature
ENSG00000199386  14      1 67726950 67727053    gene
ENSG00000199180  13      1 92002997 92003088    gene
ENSG00000199295   2      1 83884859 83884966    gene
ENSG00000199095  19      1 54291144 54291210    gene
ENSG00000198976   1      1  1104385  1104467    gene
4696 more rows ...

Debugging the topTags it is clear that there is some inconsistency 
between the rownames of the counts and on the gene tables
debug(topTags)
de.smn<- topTags(sm.n, n=157, adjust.method="BH", sort.by="p.value")
...
Browse[2]> identical(rownames(object$genes)[chosen], 
rownames(object$table)[chosen])
[1] FALSE
...

If I remove the differences existing between table and gene in the deDGEList
sm.n$genes <- 
sm.n$genes[which(rownames(sm.n$genes)%in%rownames(sm.n$table)),]

 > sm.n
An object of class "deDGEList"
$table
                 logConc  logFC p.value
ENSG00000199386   -20.6  0.722   0.338
ENSG00000199180   -14.0  0.353   0.551
ENSG00000199295   -21.4  0.632   0.418
ENSG00000199095   -23.2 -0.432   0.793
ENSG00000198976   -19.4  0.474   0.516
4654 more rows ...

$comparison
[1] "sm" "n"

$genes
                 chr strand    start      end feature
ENSG00000199386  14      1 67726950 67727053    gene
ENSG00000199180  13      1 92002997 92003088    gene
ENSG00000199295   2      1 83884859 83884966    gene
ENSG00000199095  19      1 54291144 54291210    gene
ENSG00000198976   1      1  1104385  1104467    gene
4654 more rows ...


Debugging the topTags the inconsistency between the rownames of the 
counts and on the gene tables disappears
debug(topTags)
de.smn<- topTags(sm.n, n=157, adjust.method="BH", sort.by="p.value")
...
Browse[2]> identical(rownames(object$genes)[chosen], 
rownames(object$table)[chosen])
[1] TRUE


I hope this info will be helpful to remove this bug.
Cheers
Raffaele



###########################################à
sessionInfo()

R version 2.11.0 (2010-04-22)
i386-pc-mingw32

locale:
[1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252    
LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.1252

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  
methods   base

other attached packages:
[1] edgeR_1.6.5        Genominator_1.2.0  GenomeGraphs_1.8.0 
biomaRt_2.4.0      IRanges_1.6.0      RSQLite_0.8-4
[7] DBI_0.2-5

loaded via a namespace (and not attached):
[1] limma_3.4.0  RCurl_1.3-1  tools_2.11.0 XML_2.8-1

-- 

----------------------------------------
Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
Dipartimento di Scienze Cliniche e Biologiche
c/o Az. Ospedaliera S. Luigi
Regione Gonzole 10, Orbassano
10043 Torino
tel.   ++39 0116705417
Lab.   ++39 0116705408
Fax    ++39 0119038639
Mobile ++39 3333827080
email: raffaele.calogero at unito.it
        raffaele[dot]calogero[at]gmail[dot]com
www:   http://www.bioinformatica.unito.it
Info: http://publicationslist.org/raffaele.calogero



More information about the Bioc-devel mailing list