[Bioc-devel] Possible bug in association of gene annotation to topTags in edgeR package
rcaloger
raffaele.calogero at gmail.com
Wed Jun 9 20:36:17 CEST 2010
Hi,
I have implemented edgeR in oneChannelGUI to detect differential
expression in NGS data.
I observed some inconsistency in the topTags output, since the
annotation does not correctly align with the subset of the selected
differentially expressed genes.
It seems to me that the point is the lack of alignment between the gene
annotation and the counts in the deDGEList object produced by exactTest,
when the exactTest removes some of the rows that have zero counts:
> sm.n <- exactTest(ecd, c("sm", "n"))
Comparison of groups: n - sm
Warning message:
In DGEList(counts = object$counts[, this.pair], group = group.pair, :
Removing42rows that all have zero counts.
> sm.n
An object of class "deDGEList"
$table
logConc logFC p.value
ENSG00000199386 -20.6 0.722 0.338
ENSG00000199180 -14.0 0.353 0.551
ENSG00000199295 -21.4 0.632 0.418
ENSG00000199095 -23.2 -0.432 0.793
ENSG00000198976 -19.4 0.474 0.516
4654 more rows ...
$comparison
[1] "sm" "n"
$genes
chr strand start end feature
ENSG00000199386 14 1 67726950 67727053 gene
ENSG00000199180 13 1 92002997 92003088 gene
ENSG00000199295 2 1 83884859 83884966 gene
ENSG00000199095 19 1 54291144 54291210 gene
ENSG00000198976 1 1 1104385 1104467 gene
4696 more rows ...
Debugging the topTags it is clear that there is some inconsistency
between the rownames of the counts and on the gene tables
debug(topTags)
de.smn<- topTags(sm.n, n=157, adjust.method="BH", sort.by="p.value")
...
Browse[2]> identical(rownames(object$genes)[chosen],
rownames(object$table)[chosen])
[1] FALSE
...
If I remove the differences existing between table and gene in the deDGEList
sm.n$genes <-
sm.n$genes[which(rownames(sm.n$genes)%in%rownames(sm.n$table)),]
> sm.n
An object of class "deDGEList"
$table
logConc logFC p.value
ENSG00000199386 -20.6 0.722 0.338
ENSG00000199180 -14.0 0.353 0.551
ENSG00000199295 -21.4 0.632 0.418
ENSG00000199095 -23.2 -0.432 0.793
ENSG00000198976 -19.4 0.474 0.516
4654 more rows ...
$comparison
[1] "sm" "n"
$genes
chr strand start end feature
ENSG00000199386 14 1 67726950 67727053 gene
ENSG00000199180 13 1 92002997 92003088 gene
ENSG00000199295 2 1 83884859 83884966 gene
ENSG00000199095 19 1 54291144 54291210 gene
ENSG00000198976 1 1 1104385 1104467 gene
4654 more rows ...
Debugging the topTags the inconsistency between the rownames of the
counts and on the gene tables disappears
debug(topTags)
de.smn<- topTags(sm.n, n=157, adjust.method="BH", sort.by="p.value")
...
Browse[2]> identical(rownames(object$genes)[chosen],
rownames(object$table)[chosen])
[1] TRUE
I hope this info will be helpful to remove this bug.
Cheers
Raffaele
###########################################à
sessionInfo()
R version 2.11.0 (2010-04-22)
i386-pc-mingw32
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods base
other attached packages:
[1] edgeR_1.6.5 Genominator_1.2.0 GenomeGraphs_1.8.0
biomaRt_2.4.0 IRanges_1.6.0 RSQLite_0.8-4
[7] DBI_0.2-5
loaded via a namespace (and not attached):
[1] limma_3.4.0 RCurl_1.3-1 tools_2.11.0 XML_2.8-1
--
----------------------------------------
Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
Dipartimento di Scienze Cliniche e Biologiche
c/o Az. Ospedaliera S. Luigi
Regione Gonzole 10, Orbassano
10043 Torino
tel. ++39 0116705417
Lab. ++39 0116705408
Fax ++39 0119038639
Mobile ++39 3333827080
email: raffaele.calogero at unito.it
raffaele[dot]calogero[at]gmail[dot]com
www: http://www.bioinformatica.unito.it
Info: http://publicationslist.org/raffaele.calogero
More information about the Bioc-devel
mailing list