[Bioc-devel] Behavior of select function in AnnotationDbi
James W. MacDonald
jmacdon at uw.edu
Fri Nov 20 23:30:44 CET 2015
There is an inconsistency in how select() works in AnnotationDbi when a
user passes in duplicated keys to be mapped, depending on if the mapping is
1:1 or 1:many. It's easiest to show using an example.
> select(org.Hs.eg.db, rep("1", 3), "SYMBOL")
'select()' returned many:1 mapping between keys and columns
ENTREZID SYMBOL
1 1 A1BG
2 1 A1BG
3 1 A1BG
> select(org.Hs.eg.db, rep("1", 3), "GO")
'select()' returned many:many mapping between keys and columns
ENTREZID GO EVIDENCE ONTOLOGY
1 1 GO:0003674 ND MF
2 1 GO:0003674 ND MF
3 1 GO:0003674 ND MF
This is obviously a bug. A single query for that ID results in this:
> select(org.Hs.eg.db, "1", "GO")
'select()' returned 1:many mapping between keys and columns
ENTREZID GO EVIDENCE ONTOLOGY
1 1 GO:0003674 ND MF
2 1 GO:0005576 IDA CC
3 1 GO:0005615 IDA CC
4 1 GO:0008150 ND BP
5 1 GO:0070062 IDA CC
6 1 GO:0072562 IDA CC
So the returned results are completely borked.
However, the question I have is what should be returned? To be consistent
with the first example, it should be the output expected for a single key,
repeated three times, which I have patched AnnotationDbi to do:
> select(org.Hs.eg.db, rep("1", 3), "GO")
'select()' returned many:many mapping between keys and columns
ENTREZID GO EVIDENCE ONTOLOGY
1 1 GO:0003674 ND MF
2 1 GO:0005576 IDA CC
3 1 GO:0005615 IDA CC
4 1 GO:0008150 ND BP
5 1 GO:0070062 IDA CC
6 1 GO:0072562 IDA CC
7 1 GO:0003674 ND MF
8 1 GO:0005576 IDA CC
9 1 GO:0005615 IDA CC
10 1 GO:0008150 ND BP
11 1 GO:0070062 IDA CC
12 1 GO:0072562 IDA CC
13 1 GO:0003674 ND MF
14 1 GO:0005576 IDA CC
15 1 GO:0005615 IDA CC
16 1 GO:0008150 ND BP
17 1 GO:0070062 IDA CC
18 1 GO:0072562 IDA CC
So, two questions.
1. Should duplicate keys be allowed, or should duplicates be removed
before querying the database, preferably with a message saying that dups
were removed?
2. If the answer to #1 is yes, then to be consistent, I will just commit
the patch I have made to both devel and release.
Best,
Jim
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list