[BioC] Genbank to Unigene IDs
Dave Waddell
dwaddell at nutecsciences.com
Thu Apr 15 21:09:30 CEST 2004
The output from:
mySrcUrl <- getSrcUrl("UG")
is
> mySrcUrl
[1] "ftp://ftp.ncbi.nih.gov/repository/UniGene/Hs.data.gz"
this is rejected by ABPkgBuilder:
"Error in toupper(x) : non-character argument to toupper()"
when getSrcUrl has the ALL argument it gives:
mySrcUrls <- getSrcUrl(src = "ALL",organism = "human")
> mySrcUrls
LL
"ftp://ftp.ncbi.nih.gov/refseq/LocusLink/LL_tmpl.gz"
GP
"http://www.genome.ucsc.edu/goldenPath/hg16/database/"
UG
"ftp://ftp.ncbi.nih.gov/repository/UniGene/Hs.data.gz"
GO
"http://www.godatabase.org/dev/database/archive/2004-03-01/go_200403-termdb.
xml.gz"
KEGG
"ftp://ftp.genome.ad.jp/pub/kegg/pathways"
YG
"http://www.yeastgenome.org/DownloadContents.shtml"
HG
"ftp://ftp.ncbi.nih.gov/pub/HomoloGene/hmlg.ftp"
So I thought I might cheat and use:
mySrcUrl <- mySrcUrls[3]
> mySrcUrls[3]
UG
"ftp://ftp.ncbi.nih.gov/repository/UniGene/Hs.data.gz"
As you can see this gets rejected as well:
Error in loadFromUrl(srcUrl(object), dist) :
URL NA is incorrect or the target site is not responding!
Error in unifyMappings(base, ll, ug, otherSrc, fromWeb) :
Failed to get or parse UniGene data becaus of:
Error in loadFromUrl(srcUrl(object), dist) :
URL NA is incorrect or the target site is not responding!
Is it possible to use Annotation that was created on Linux in the Windows
environment? If so, does anyone want to donate it?
Thanks, Dave.
-----Original Message-----
From: James MacDonald [mailto:jmacdon at med.umich.edu]
Sent: Thursday, April 15, 2004 9:52 AM
To: dwaddell at nutecsciences.com; bioconductor at stat.math.ethz.ch
Subject: RE: [BioC] Genbank to Unigene IDs
You probably need to update your AnnBuilder. A recent version was using
the system temp directory instead of the AnnBuilder temp directory,
which didn't work well on Win32. AFAIK, the current devel version of
AnnBuilder has been rolled back to use the AnnBuilder temp dir.
As an aside, if all you need is GB -> UG mappings, it is probably
overkill to use ABPkgBuilder in this way, which is going to parse locus
link and KEGG also (which takes some time). There are two alternatives
that I can think of, (both untested by me). First, use ABPkgBuilder, but
only parse UG by changing the srcUrl to:
mySrcUrl <- getSrcUrl("UG")
Another possiblity is to use the UG class directly. See ?UG.
Best,
Jim
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
>>> "Dave Waddell" <dwaddell at nutecsciences.com> 04/15/04 10:37AM >>>
I tried running this but got an error:
> library(AnnBuilder)
> myBaseType <- "gb"
> myDir <- "C:/Temp"
> myBase <- "C:/Temp/tempFile.txt"
> mySrcUrls <- getSrcUrl(src = "ALL",organism = "human")
> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
+ myBaseType, pkgName = "Hum_Agi1A", pkgPath = myDir,organism =
+ "human", version = "1.0",
+ makeXML = TRUE, author = list(author = "dpritch", maintainer =
+ "dpritch at u.washington.edu"), fromWeb = TRUE)
[1] "It may take me a while to process the data. Be patient!"
Warning message:
cannot open file `C:/R/rw1090beta/library/AnnBuilder/temp/tempOut31783'
Error in unifyMappings(base, ll, ug, otherSrc, fromWeb) :
Failed to get or parse LocusLink data because of:
Error in file(file, "r") : unable to open connection
I had changed this directory from "Read Only" and checked that I had
write
permissions from within R:
> setwd("C:/R/rw1090beta/library/AnnBuilder/temp")
> dir()
[1] "file24842Tgo.xml" "README"
> write("Hello")
> dir()
[1] "data" "file24842Tgo.xml" "README"
I get the same error if I run
example("ABPkgBuilder")
Any suggestions?
Dave.
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of A.J.
Rossini
Sent: Thursday, April 15, 2004 8:48 AM
To: Gordon Smyth
Cc: BioC Mailing List
Subject: Re: [BioC] Genbank to Unigene IDs
Gordon Smyth <smyth at wehi.edu.au> writes:
> I have a list of GenBank IDs for which I'd like the corresponding
> Unigene cluster IDs. What is the easiest way to do this using
> Bioconductor functions? (I've scanned annotate and AnnBuilder help
and
> vignettes, although way too quickly.)
>
> For the sake of being specific, here's a concrete example. What's
> Unigene for GB="NM_004551"?
Here's what I'd do (more of a chip-style analysis than instant
WWW-based gratification, which might also be possible):
1. First create a tab-separated 2 column file, first row dummy
probe IDs (could be real or not), second row GB ID's. So, you'd have
1 row in a file called "Dummy.tsv"
1 NM_004551
2. Have a script similar to:
library(AnnBuilder)
myBaseType <- "gb"
# myDir maps the directory where you want the data package built ---
# obviously this should be changed for the directory structure on the
# linux box
myDir <- "C:/DavidsData/Annotation_Folders"
# myBase maps the file that contains the mapping of Agilent feature
# numbers to GenBank ID's
myBase <- "C:/DavidsData/Annotation_Folders/Dummy.tsv"
#use AnnBuilder internal lists of data sources
mySrcUrls <- getSrcUrl(src = "ALL",organism = "human")
#invoke ABPkgBuilder
ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
myBaseType, pkgName = "Hum_Agi1A", pkgPath =
myDir,
organism =
"human", version = "1.0",
makeXML = TRUE, author = list(author =
"dpritch",
maintainer =
"dpritch at u.washington.edu"), fromWeb = TRUE)
3. install the package environment
4. use it to find the IDs (can verify the ID mapping with the XML
output file, as well)
best,
-tony
--
rossini at u.washington.edu
http://www.analytics.washington.edu/
Biomedical and Health Informatics University of Washington
Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research
Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email
CONFIDENTIALITY NOTICE: This e-mail message and any\ attachm...{{dropped}}
More information about the Bioconductor
mailing list