[BioC] Genbank to Unigene IDs
Dave Waddell
dwaddell at nutecsciences.com
Thu Apr 15 15:49:51 CEST 2004
I'm struggling with the same problem and using the command line version of
Matchminer right now.
http://discover.nci.nih.gov/matchminer/html/command.jsp
For example, I have a list of Genbank Accession numbers (Matchminer will
take a whole slew of inputs and produce almost any output) as follows:
Suppose tempFile contains:
AA936757
AA683077
R60193
AA495846
AA488391
AA487582
AA115076
N92478
R43483
W65461
R22625
N64741
H99588
AI091770
N47099
AA927490
H93335
AA460756
H91651
R98064
N92519
H57309
AA676254
R70685
AA156324
AA970865
AA426311
AI266752
Then running Matchminer (you have to give all of the options on the command
line or it will go into interactive mode):
java -jar MatchMiner.jar -Tlookup -ORhuman -I1accno -IA1genebankaccnumber
-OTsymbol -Arefseqnumber -IF1tempFile -OFstdout -HStrue
will produce the output:
**************************************
Matchminer Build:115
Genomics and Bioinformatics Group,NCI
NIH
**************************************
Input Summary Value
Build 115
Date Thursday, April 15, 2004
Operation Lookup
Organism Homo sapiens
Input Source Name C:\Temp\tempFile.txt
Input Type GenBank Accession Number
Input Algorithm GenBank(All inc. RefSeq)
Output Type Symbol
Output Algorithm RefSeq (DNA, RNA and
Protein)
Lookup Summary:
22 Items from the input list that has output
6 Items from the input list with no output
2 Items from the input list that were not found in the database
Function Original Order Input GenBank Accession Number Output
Symbol Mult. Assoc. in Input GenBank Accession Number
Index
Lookup Output 19 H91651 NM_002041 Y 2357
Lookup Output 19 H91651 NM_005254 Y 2357
Lookup Output 19 H91651 NM_016654 Y 2357
Lookup Output 19 H91651 NM_016655 Y 2357
Lookup Output 19 H91651 NM_181427 Y 2357
Lookup Output 19 H91651 NM_017976 Y 2357
Lookup Output 17 H93335 NM_022465 - 16249
Lookup Output 13 H99588 NM_002285 - 3642
Lookup Output 5 AA488391 NM_005875 - 9334
Lookup Output 11 R22625 NM_001799 - 959
Lookup Output 10 W65461 NM_004419 - 1694
Lookup Output 7 AA115076 NM_006079 - 9407
Lookup Output 6 AA487582 NM_000127 - 1952
Lookup Output 2 AA683077 NM_002745 - 5222
Lookup Output 2 AA683077 NM_138957 - 5222
Lookup Output 25 AA156324 NM_004613 - 6604
Lookup Output 25 AA156324 NM_198951 - 6604
Lookup Output 26 AA970865 NM_021167 - 15857
Lookup Output 3 R60193 NM_014423 - 11919
Lookup Output 15 N47099 NM_005901 - 3811
Lookup Output 16 AA927490 NM_005419 Y 6344
Lookup Output 16 AA927490 NM_001638 Y 6344
Lookup Output 27 AA426311 NM_004527 - 3937
Lookup Output 27 AA426311 NM_013999 - 3937
Lookup Output 19 H91651 NM_017976 Y 14380
Lookup Output 19 H91651 NM_181427 Y 14380
Lookup Output 19 H91651 NM_002041 Y 14380
Lookup Output 19 H91651 NM_005254 Y 14380
Lookup Output 19 H91651 NM_016654 Y 14380
Lookup Output 19 H91651 NM_016655 Y 14380
Lookup Output 9 R43483 NM_000210 - 3408
Lookup Output 22 H57309 NM_003068 - 6168
Lookup Output 16 AA927490 NM_001638 Y 302
Lookup Output 16 AA927490 NM_005419 Y 302
Lookup Output 18 AA460756 NM_005056 - 5540
Lookup Output 1 AA936757 NM_005130 - 9059
Lookup Output 4 AA495846 NM_001453 - 2109
No Output 21 N92519 - - 325911
No Output 20 R98064 - - 63470
No Output 28 AI266752 - - 49752
No Output 8 N92478 - - 60114
No Output 23 AA676254 - - 603790
No Output 14 AI091770 - - 63315
No GeneIndex 12 N64741 - - -
No GeneIndex 24 R70685 - - -
For your example, substitute " -OTunigene -Aunigenenumber" for the output. I
have had less success with Unigene IDs and in fact your example should
produce:
Hs.429506
But it gives:
java -jar MatchMiner.jar -Tlookup -ORhuman -I1accno -IA1genebankaccnumber
-OTunigene -Aunigenenumber -IF1C:\Temp\tempFile.txt -OFstdout -HStrue
**************************************
Matchminer Build:115
Genomics and Bioinformatics Group,NCI
NIH
**************************************
Input Summary Value
Build 115
Date Thursday, April 15, 2004
Operation Lookup
Organism Homo sapiens
Input Source Name C:\Temp\tempFile.txt
Input Type GenBank Accession Number
Input Algorithm GenBank(All inc. RefSeq)
Output Type UniGene Cluster Id
Output Algorithm Active UniGene Cluster Ids
Lookup Summary:
0 Items from the input list that has output
0 Items from the input list with no output
1 Items from the input list that were not found in the database
Function Original Order Input GenBank Accession Number Output
UniGene Cluster Id Mult. Assoc. in Input GenBank Ac
cession Number Index
No GeneIndex 1 NM_004551 - - -
Dave.
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Gordon Smyth
Sent: Thursday, April 15, 2004 2:45 AM
To: BioC Mailing List
Subject: [BioC] Genbank to Unigene IDs
I have a list of GenBank IDs for which I'd like the corresponding Unigene
cluster IDs. What is the easiest way to do this using Bioconductor
functions? (I've scanned annotate and AnnBuilder help and vignettes,
although way too quickly.)
For the sake of being specific, here's a concrete example. What's Unigene
for GB="NM_004551"?
Thanks a lot
Gordon
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
More information about the Bioconductor
mailing list