[Bioc-devel] Behavior of select function in AnnotationDbi
Morgan, Martin
Martin.Morgan at roswellpark.org
Fri Nov 20 23:37:42 CET 2015
I think the answer to 1 should be yes, duplicate keys are allowed. For instance, a vector of ids and a factor that groups the ids somehow (e.g., by experiment), with ids unique in each group.
So I'm for step #2.
Martin
________________________________________
From: Bioc-devel [bioc-devel-bounces at r-project.org] on behalf of James W. MacDonald [jmacdon at uw.edu]
Sent: Friday, November 20, 2015 5:30 PM
To: bioc-devel at r-project.org
Subject: [Bioc-devel] Behavior of select function in AnnotationDbi
There is an inconsistency in how select() works in AnnotationDbi when a
user passes in duplicated keys to be mapped, depending on if the mapping is
1:1 or 1:many. It's easiest to show using an example.
> select(org.Hs.eg.db, rep("1", 3), "SYMBOL")
'select()' returned many:1 mapping between keys and columns
ENTREZID SYMBOL
1 1 A1BG
2 1 A1BG
3 1 A1BG
> select(org.Hs.eg.db, rep("1", 3), "GO")
'select()' returned many:many mapping between keys and columns
ENTREZID GO EVIDENCE ONTOLOGY
1 1 GO:0003674 ND MF
2 1 GO:0003674 ND MF
3 1 GO:0003674 ND MF
This is obviously a bug. A single query for that ID results in this:
> select(org.Hs.eg.db, "1", "GO")
'select()' returned 1:many mapping between keys and columns
ENTREZID GO EVIDENCE ONTOLOGY
1 1 GO:0003674 ND MF
2 1 GO:0005576 IDA CC
3 1 GO:0005615 IDA CC
4 1 GO:0008150 ND BP
5 1 GO:0070062 IDA CC
6 1 GO:0072562 IDA CC
So the returned results are completely borked.
However, the question I have is what should be returned? To be consistent
with the first example, it should be the output expected for a single key,
repeated three times, which I have patched AnnotationDbi to do:
> select(org.Hs.eg.db, rep("1", 3), "GO")
'select()' returned many:many mapping between keys and columns
ENTREZID GO EVIDENCE ONTOLOGY
1 1 GO:0003674 ND MF
2 1 GO:0005576 IDA CC
3 1 GO:0005615 IDA CC
4 1 GO:0008150 ND BP
5 1 GO:0070062 IDA CC
6 1 GO:0072562 IDA CC
7 1 GO:0003674 ND MF
8 1 GO:0005576 IDA CC
9 1 GO:0005615 IDA CC
10 1 GO:0008150 ND BP
11 1 GO:0070062 IDA CC
12 1 GO:0072562 IDA CC
13 1 GO:0003674 ND MF
14 1 GO:0005576 IDA CC
15 1 GO:0005615 IDA CC
16 1 GO:0008150 ND BP
17 1 GO:0070062 IDA CC
18 1 GO:0072562 IDA CC
So, two questions.
1. Should duplicate keys be allowed, or should duplicates be removed
before querying the database, preferably with a message saying that dups
were removed?
2. If the answer to #1 is yes, then to be consistent, I will just commit
the patch I have made to both devel and release.
Best,
Jim
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
More information about the Bioc-devel
mailing list