[BioC] Genbank to Unigene IDs

Dave Waddell dwaddell at nutecsciences.com
Thu Apr 15 15:49:51 CEST 2004


I'm struggling with the same problem and using the command line version of
Matchminer right now.
http://discover.nci.nih.gov/matchminer/html/command.jsp

For example, I have a list of Genbank Accession numbers (Matchminer will
take a whole slew of inputs and produce almost any output) as follows:
Suppose tempFile contains:
AA936757
AA683077
R60193
AA495846
AA488391
AA487582
AA115076
N92478
R43483
W65461
R22625
N64741
H99588
AI091770
N47099
AA927490
H93335
AA460756
H91651
R98064
N92519
H57309
AA676254
R70685
AA156324
AA970865
AA426311
AI266752

Then running Matchminer (you have to give all of the options on the command
line or it will go into interactive mode):

java -jar MatchMiner.jar -Tlookup -ORhuman -I1accno -IA1genebankaccnumber
-OTsymbol -Arefseqnumber -IF1tempFile -OFstdout -HStrue

will produce the output:
                        **************************************
                                 Matchminer Build:115
                        Genomics and Bioinformatics Group,NCI
                                        NIH
                        **************************************

Input Summary                                           Value

Build                                           115
Date                                            Thursday, April 15, 2004
Operation                                       Lookup
Organism                                        Homo sapiens
Input Source Name                               C:\Temp\tempFile.txt
Input Type                                      GenBank Accession Number
Input Algorithm                                 GenBank(All inc. RefSeq)
Output Type                                     Symbol
Output Algorithm                                RefSeq (DNA, RNA and
Protein)

Lookup Summary:
   22 Items from the input list that has output
   6 Items from the input list with no output
   2 Items from the input list that were not found in the database

Function        Original Order  Input GenBank Accession Number  Output
Symbol   Mult. Assoc. in Input GenBank Accession Number
Index
Lookup Output   19      H91651  NM_002041       Y       2357
Lookup Output   19      H91651  NM_005254       Y       2357
Lookup Output   19      H91651  NM_016654       Y       2357
Lookup Output   19      H91651  NM_016655       Y       2357
Lookup Output   19      H91651  NM_181427       Y       2357
Lookup Output   19      H91651  NM_017976       Y       2357
Lookup Output   17      H93335  NM_022465       -       16249
Lookup Output   13      H99588  NM_002285       -       3642
Lookup Output   5       AA488391        NM_005875       -       9334
Lookup Output   11      R22625  NM_001799       -       959
Lookup Output   10      W65461  NM_004419       -       1694
Lookup Output   7       AA115076        NM_006079       -       9407
Lookup Output   6       AA487582        NM_000127       -       1952
Lookup Output   2       AA683077        NM_002745       -       5222
Lookup Output   2       AA683077        NM_138957       -       5222
Lookup Output   25      AA156324        NM_004613       -       6604
Lookup Output   25      AA156324        NM_198951       -       6604
Lookup Output   26      AA970865        NM_021167       -       15857
Lookup Output   3       R60193  NM_014423       -       11919
Lookup Output   15      N47099  NM_005901       -       3811
Lookup Output   16      AA927490        NM_005419       Y       6344
Lookup Output   16      AA927490        NM_001638       Y       6344
Lookup Output   27      AA426311        NM_004527       -       3937
Lookup Output   27      AA426311        NM_013999       -       3937
Lookup Output   19      H91651  NM_017976       Y       14380
Lookup Output   19      H91651  NM_181427       Y       14380
Lookup Output   19      H91651  NM_002041       Y       14380
Lookup Output   19      H91651  NM_005254       Y       14380
Lookup Output   19      H91651  NM_016654       Y       14380
Lookup Output   19      H91651  NM_016655       Y       14380
Lookup Output   9       R43483  NM_000210       -       3408
Lookup Output   22      H57309  NM_003068       -       6168
Lookup Output   16      AA927490        NM_001638       Y       302
Lookup Output   16      AA927490        NM_005419       Y       302
Lookup Output   18      AA460756        NM_005056       -       5540
Lookup Output   1       AA936757        NM_005130       -       9059
Lookup Output   4       AA495846        NM_001453       -       2109
No Output       21      N92519  -       -       325911
No Output       20      R98064  -       -       63470
No Output       28      AI266752        -       -       49752
No Output       8       N92478  -       -       60114
No Output       23      AA676254        -       -       603790
No Output       14      AI091770        -       -       63315
No GeneIndex    12      N64741  -       -       -
No GeneIndex    24      R70685  -       -       -

For your example, substitute " -OTunigene -Aunigenenumber" for the output. I
have had less success with Unigene IDs and in fact your example should
produce:
Hs.429506
But it gives:
java -jar MatchMiner.jar -Tlookup -ORhuman -I1accno -IA1genebankaccnumber
-OTunigene -Aunigenenumber  -IF1C:\Temp\tempFile.txt -OFstdout -HStrue
                        **************************************
                                 Matchminer Build:115
                        Genomics and Bioinformatics Group,NCI
                                        NIH
                        **************************************

Input Summary                                           Value

Build                                           115
Date                                            Thursday, April 15, 2004
Operation                                       Lookup
Organism                                        Homo sapiens
Input Source Name                               C:\Temp\tempFile.txt
Input Type                                      GenBank Accession Number
Input Algorithm                                 GenBank(All inc. RefSeq)
Output Type                                     UniGene Cluster Id
Output Algorithm                                Active UniGene Cluster Ids

Lookup Summary:
   0 Items from the input list that has output
   0 Items from the input list with no output
   1 Items from the input list that were not found in the database

Function        Original Order  Input GenBank Accession Number  Output
UniGene Cluster Id       Mult. Assoc. in Input GenBank Ac
cession Number  Index
No GeneIndex    1       NM_004551       -       -       -

Dave.

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Gordon Smyth
Sent: Thursday, April 15, 2004 2:45 AM
To: BioC Mailing List
Subject: [BioC] Genbank to Unigene IDs

I have a list of GenBank IDs for which I'd like the corresponding Unigene 
cluster IDs. What is the easiest way to do this using Bioconductor 
functions? (I've scanned annotate and AnnBuilder help and vignettes, 
although way too quickly.)

For the sake of being specific, here's a concrete example. What's Unigene 
for GB="NM_004551"?

Thanks a lot
Gordon

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list