[BioC] Help me understand org.Hs.eg.db
Hervé Pagès
hpages at fhcrc.org
Tue Apr 7 20:32:54 CEST 2009
Hi Daren,
First note that for any Bimap object 'x':
length(mget(mappedRkeys(x), x))
is the same as:
count.mappedRkeys(x)
but the latter is much more efficient.
Furthermore, if 'x' is a right-to-left map like in your case
(see 'summary(x)'), then then 'count.mappedRkeys(x)' is equivalent
to 'count.mappedkeys(x)'
But generally speaking, there is no reason to expect:
nrow(toTable(x)) == count.mappedkeys(x) # generally not true
unless the mapping contained in 'x' is one-to-one.
Explanation:
'toTable(x)' returns a flat representation of Bimap object 'x' e.g.
Lkey Rkey
1 a A
2 a B
3 b A
4 d C
All the edges (or links) of the bipartite graph are listed. Note that
right key "A" is mapped to left keys "a" and "b", so this mapping is
not one-to-one. The left (or right) keys that don't map to anything
don't appear in this table.
'count.mappedRkeys(x)' counts the number of (unique) right keys that
map at least one left key i.e. 3 in the small example above.
So in fact, the following is true for any Bimap object 'x':
length(unique(toTable(x)[[2]])) == count.mappedkeys(x) # always TRUE
Hope this helps.
Cheers,
H.
Daren Tan wrote:
> I am using two approaches to get EntrezID to genes mapping, as well as
> genes to EntrezID mappings. toTable gives same number of mappings in
> both directions, but mget doesn't. Which approach should I trust and
> why ?
>
>> dim(toTable(org.Hs.egSYMBOL2EG))
> [1] 39824 2
>> dim(toTable(org.Hs.egSYMBOL))
> [1] 39824 2
>
>> length(mget(mappedRkeys(org.Hs.egSYMBOL2EG), org.Hs.egSYMBOL2EG))
> [1] 39800
>> length(mget(mappedLkeys(org.Hs.egSYMBOL), org.Hs.egSYMBOL))
> [1] 39824
>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines tools stats graphics grDevices utils
> datasets methods base
>
> other attached packages:
> [1] KEGG.db_2.2.5 GOstats_2.8.0 Category_2.8.4
> genefilter_1.22.0 survival_2.34-1 RBGL_1.18.0
> annotate_1.20.1
> [8] xtable_1.5-4 GO.db_2.2.5 graph_1.20.0
> org.Hs.eg.db_2.2.6 RSQLite_0.7-1 DBI_0.2-4
> AnnotationDbi_1.4.3
> [15] Biobase_2.2.2
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.12 gdata_2.4.2 gplots_2.6.0
> GSEABase_1.4.0 gtools_2.5.0-1 xlsReadWritePro_1.4.0
> [7] XML_2.1-0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list