[BioC] Genbank to Unigene IDs

Thu Apr 15 16:37:07 CEST 2004

I tried running this but got an error:
> library(AnnBuilder)
> myBaseType <- "gb"
> myDir <- "C:/Temp"
> myBase <- "C:/Temp/tempFile.txt"
> mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")
> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
+      myBaseType, pkgName = "Hum_Agi1A", pkgPath = myDir,organism =
+      "human",  version = "1.0",
+      makeXML = TRUE, author = list(author = "dpritch", maintainer =
+      "dpritch at u.washington.edu"), fromWeb = TRUE)
[1] "It may take me a while to process the data. Be patient!"
Warning message: 
cannot open file `C:/R/rw1090beta/library/AnnBuilder/temp/tempOut31783' 
Error in unifyMappings(base, ll, ug, otherSrc, fromWeb) : 
        Failed to get or parse LocusLink data because of:

 Error in file(file, "r") : unable to open connection

I had changed this directory from "Read Only" and checked that I had write
permissions from within R:
> setwd("C:/R/rw1090beta/library/AnnBuilder/temp")
> dir()
[1] "file24842Tgo.xml" "README"          
> write("Hello")
> dir()
[1] "data"             "file24842Tgo.xml" "README"

I get the same error if I run 
example("ABPkgBuilder")

Any suggestions?

Dave.
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of A.J. Rossini
Sent: Thursday, April 15, 2004 8:48 AM
To: Gordon Smyth
Cc: BioC Mailing List
Subject: Re: [BioC] Genbank to Unigene IDs

Gordon Smyth <smyth at wehi.edu.au> writes:

> I have a list of GenBank IDs for which I'd like the corresponding
> Unigene cluster IDs. What is the easiest way to do this using
> Bioconductor functions? (I've scanned annotate and AnnBuilder help and
> vignettes, although way too quickly.)
>
> For the sake of being specific, here's a concrete example. What's
> Unigene for GB="NM_004551"?

Here's what I'd do (more of a chip-style analysis than instant
WWW-based gratification, which might also be possible):

1. First create a tab-separated 2 column file, first row dummy
probe IDs (could be real or not), second row GB ID's.  So, you'd have
1 row in a file called "Dummy.tsv"

1    NM_004551

2.  Have a script similar to:

library(AnnBuilder)
myBaseType <- "gb"
# myDir maps the directory where you want the data package built ---
# obviously this should be changed for the directory structure on the
# linux box
myDir <- "C:/DavidsData/Annotation_Folders"

# myBase maps the file that contains the mapping of Agilent feature
# numbers to GenBank ID's
myBase <- "C:/DavidsData/Annotation_Folders/Dummy.tsv"

#use AnnBuilder internal lists of data sources
mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")

#invoke ABPkgBuilder
ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
                      myBaseType, pkgName = "Hum_Agi1A", pkgPath = myDir,
organism =
                      "human",  version = "1.0",
                      makeXML = TRUE, author = list(author = "dpritch",
maintainer =
                     "dpritch at u.washington.edu"), fromWeb = TRUE)

3. install the package environment

4. use it to find the IDs (can verify the ID mapping with the XML
output file, as well)

best,
-tony

-- 
rossini at u.washington.edu            http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor