[BioC] Error message from AnnBuilder

Fri Mar 31 23:00:09 CEST 2006

Hi Hua,

On Wed, 29 Mar 2006, Hua Weng wrote:

> Hi Ting-Yuan:
> 
> As you suggested, I tried to use "live link" and get files through the web.
> First I set source URL like this:
> mySrcUrls <- c(EG="ftp://ftp.ncbi.nih.gov/gene/DATA",
> UG="ftp://ftp.ncbi.nih.gov/repository/UniGene/Bos_taurus/Bt.data.gz",
> GP="ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes",
> GO="http://www.godatabase.org/dev/database/archive/latest",
> KEGG="ftp://ftp.genome.ad.jp/pub/kegg/pathways")
> I got the following errors:
> Error in srcUrls[["KEGGGENOME"]] : subscript out of bounds
> In addition: Warning message:
> Organism Bos taurus is not supported by GoldenPath (GP). in:
> getUCSCUrl(organism)

This is not what I suggested.  What I said is NOT using the baseMapType 
argument in the ABPkgBuilder.  In other words, you should run ABPkgBuilder 
like this:

ABPkgBuilder(baseName="hgu95av2.GeneBankID",
                 baseMapType="gbNRef",
                 pkgName="hgu95av2",
                 pkgPath=".",
                 organism="Homo sapiens",
                 version="0.0.0",
                 author=list(
                   authors="Ting-Yuan Liu, ChenWei Lin, Seth Falcon, 
Jianhua Zhang, James W. MacDonald",
                   maintainer="Ting-Yuan Liu <tliu at fhcrc.org>"
                   )

See?  I didn't use the argument baseMapType and fromWeb.  This is what I 
mean.  AnnBuilder knows how to get the correct source urls, so that you 
don't have to worry about it.  

> 
> I changed the source URL as following, and then it works.
> mySrcUrls <- c(EG="ftp://ftp.ncbi.nih.gov/gene/DATA",
> UG="ftp://ftp.ncbi.nih.gov/repository/UniGene/Bos_taurus/Bt.data.gz",
> 	GO="http://www.godatabase.org/dev/database/archive/latest")
> But I still get seven environments and nothing useful back. I checked my
> Gene bank accession IDs, and they can map to Gene ID and UniGene ID and
> possible GO information. Is this because AnnBuilder cannot handle the
> organism other than Human, mouse and rat? Have you tested to build
> annotation package for other organism such as Cow, Rice..?
> 

You mean you check your Genebank accession ids on the NCBI website, right?  
Actually, not all the information you can find on the NCBI website are 
included in the NCBI downlaodable files, but AnnBuilder builds packages 
according to these downloadable files.  

We tried to use ABPkgBuilder to build annotation packages for Affymetrix 
grape chips, but without success.  We also have the same problem in 
building Arabidopsis annotation packages in ABPkgBuilder, and therefore 
we develop a new function in AnnBuilder to build that according to the 
data we found outside of NCBI.  It seems that your case is very similar to 
Arabidopsis.  

One thing you can do, if there is no confidential issues on it, is sending 
me the basefiles (not send to the list, please) and I will try to build 
the package and see if I have the same problem.  

HTH,
Ting-Yuan

> Thank you very much for your advice.
> Hua
> 
> -----Original Message-----
> From: Ting-Yuan Liu [mailto:tliu at fhcrc.org] 
> Sent: Wednesday, March 29, 2006 12:07 PM
> To: Hua Weng
> Cc: bioconductor at stat.math.ethz.ch
> Subject: RE: Error message from AnnBuilder
> 
> 
> Hi Hua,
> 
> On Tue, 28 Mar 2006, Hua Weng wrote:
> 
> > My questions are:
> > 1)If I provide more local annotation files, may I get more information
> back?
> 
> Using more annotation files (basefiles) could improves the mapping 
> results, but not the number of environments.  It is always a good idea to 
> provide as many basefiles as you could.  
> 
> I am not sure why you didn't get many environments.  Could you try not to 
> use local files to build annotation packages?  I mean you should remove 
> the baseMapType and fromWeb arguments so that ABPkgBuilder could download 
> the data from the web.  
> 
> > 2)I didn't get any GO term back, does it mean these genes for cow don't
> have
> > any GO?
> 
> No.  If cow doesn't have any associated Go information, you will get an 
> environment whose values are all NAs.  See (1) for details to get more 
> environments.  
> 
> > 3)If I map these cow's Gene bank Accession ID to Mus musculus, can I get
> > some useful information back? Do I need to change cow's gene bank
> accession
> > IDs to Mouse's gene bank accession IDs? 
> 
> I am not sure if I understand what you mean here.  Are you interested in 
> the homology between cow and mouse?  The package btahomology might be 
> what you want.  You can find it at
> http://www.bioconductor.org/packages/data/annotation/1.8/html/btahomology.ht
> ml
> 
> HTH,
> Ting-Yuan
> ______________________________________
> Ting-Yuan Liu
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> Seattle, WA, USA
> ______________________________________
> 
> > 
> > Thank you very much for your response!
> > 
> > Hua
> > 
> > -----Original Message-----
> > From: Ting-Yuan Liu [mailto:tliu at fhcrc.org] 
> > Sent: Tuesday, March 28, 2006 3:21 PM
> > To: Hua Weng
> > Cc: bioconductor at stat.math.ethz.ch
> > Subject: RE: Error message from AnnBuilder
> > 
> > 
> > Hi Hua,
> > 
> > The "subscript out of bounds" bug had been fixed in the developmental 
> > AnnBuilder, I believe.  Please have a try.
> > 
> > HTH,
> > Ting-Yuan   
> > ______________________________________
> > Ting-Yuan Liu
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > Seattle, WA, USA
> > ______________________________________
> > 
> > On Tue, 28 Mar 2006, Hua Weng wrote:
> > 
> > > 
> > > Dear List and Ting-Yuan:
> > > I finally decided to use AnnBuilder on Linux server. And I got sample
> data
> > > set, thgu95a, worked and I successfully installed the annotation
> package.
> > > But when I try to use one data set for cow (Bos taurus), I got the
> > following
> > > error message:
> > > 
> > > Error in all(is.na(annotation[, "GO"])) : subscript out of bounds
> > > 
> > > I don't know what does this error mean? Is it because my data set cannot
> > get
> > > any GO term?
> > > 
> > > The code is as follow:
> > > > library("AnnBuilder")
> > > > myBase <- file.path("cluster6_Asitha_Bt.txt")
> > > > myDir <- "/home/hua/project/bioconductor/AnnBuilder/"
> > > > myBaseType="gbNRef"
> > > > mySrcUrls <-
> > > c(EG="file:////home/hua/project/bioconductor/AnnBuilder/gene_DATA",
> > > +
> > >
> >
> UG="file:////home/hua/project/bioconductor/AnnBuilder/UniGene/Bos_taurus/Bt.
> > > data.gz",
> > > +
> > >
> >
> GO="file:////home/hua/project/bioconductor/AnnBuilder/go_200603-termdb.rdf-x
> > > ml.gz")
> > > > ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
> > > myBaseType,
> > > + pkgName = "AsithaBtPkg", pkgPath = myDir, organism = "Bos taurus",
> > version
> > > = "1.1.0",
> > > + author = list(author = "Hua Weng", maintainer = "Hua Weng
> > > <hweng at biochem.okstate.edu>"), fromWeb = False)
> > > 
> > > The following is data set look like:
> > > > myBase
> > > [1] "cluster6_Asitha_Bt.txt"
> > > > read.table(myBase, sep="\t", header=FALSE, as.is=TRUE)
> > >       V1           V2
> > > 1  a2g09    NM_174062
> > > 2  g1o22         <NA>
> > > 3  a1d09    XM_879288
> > > 4  a1e10    NM_175825
> > > 5  g4n11    XM_873598
> > > 6  g1b02         <NA>
> > > 7  f7c16         <NA>
> > > 8  a1h04    XM_580317
> > > 9  f5l19     BC102351
> > > 10 g4p13    XM_879908
> > > 11 g4k22    NM_173968
> > > 12 f6d15    XM_874804
> > > 13 g4l22    XM_615696
> > > 14 g1h03    XM_873394
> > > 15 a1d10    NM_174658
> > > 16 f6c14         <NA>
> > > 17 g4k13	NM_001034575
> > > 18 f7k05    XM_868174
> > > 19 g4k23         <NA>
> > > 20 f6k09	NM_001007815
> > > 21 f6d16    NM_174792
> > > 22 g4f07         <NA>
> > > 23 f5k24     BT021073
> > > 
> > > The first column is probe ID and the second column is Gene Bank
> accession
> > ID
> > > for Bos taurus. If I want to get the annotation for Mus musculus, can I
> > > still use the Gene bank accession ID for Bos Taurus?
> > > 
> > > > sessionInfo()
> > > R version 2.2.0, 2005-10-06, i686-pc-linux-gnu
> > > 
> > > attached base packages:
> > > [1] "tools"     "methods"   "stats"     "graphics"  "grDevices" "utils"
> > > [7] "datasets"  "base"
> > > 
> > > other attached packages:
> > >         GO AnnBuilder   annotate        XML    Biobase
> > >   "1.10.0"    "1.8.0"    "1.8.0"   "0.99-6"    "1.8.0"
> > > 
> > > I highly appreciate any comments and suggestions.
> > > 
> > > Thanks,
> > > Hua
> > > 
> > > 
> > > -----Original Message-----
> > > From: Ting-Yuan Liu [mailto:tliu at fhcrc.org] 
> > > Sent: Friday, March 24, 2006 11:48 AM
> > > To: Hua Weng
> > > Cc: bioconductor at stat.math.ethz.ch
> > > Subject: Re: AnnBuilder
> > > 
> > > 
> > > Hi Hua,
> > > 
> > > Yes, you could run AnnBuilder in Windows system.  That is not what I 
> > > usually do, but I tried and succeed.  However, my R in the windows 
> > > machine is "built from source" (see section 3.1 of the manual "R 
> > > Installation and Administration") and it might be a little different
> from 
> > > your R (which is built from the binary installer, I guess.)  Someone 
> > > reported to me that it is unable to run AnnBuilder in the Windows
> system, 
> > > but it did work in my machine.  Therefore, you can try first to see if
> you
> > 
> > > can build annotation packages from the binary-installed R.  If not, you 
> > > should switch to the source-installed R.  
> > > 
> > > HTH,
> > > Ting-Yuan
> > > ______________________________________
> > > Ting-Yuan Liu
> > > Program in Computational Biology
> > > Division of Public Health Sciences
> > > Fred Hutchinson Cancer Research Center
> > > Seattle, WA, USA
> > > ______________________________________
> > > 
> > > On Wed, 22 Mar 2006, Hua Weng wrote:
> > > 
> > > > Hi, Bioconductor list and Ting-Yuan:
> > > > 
> > > >  
> > > > 
> > > > I have problems in using AnnBuilder package.
> > > > 
> > > >  
> > > > 
> > > > 1)       May I use Windows based R environment to run ABPkgBuilder? I
> > > > haven't been successfully run this command. I saw there is a condition
> > > > before this command is "if(.Platform$OS != "windows" &&
> interactive())",
> > > > Does this mean this command cannot run on windows platform?
> > > > 
> > > > 2)       I also tried to install AnnBuilder in R2.2.0 on Linux server.
> > But
> > > I
> > > > haven't been successfully installed it. The problem is before I could
> > > > install XML package, it gave me error message "****    You should use
> a
> > > > recent version of libxml2, i.e. 2.6.22 or higher  ****". And when I
> > tried
> > > to
> > > > install 'libxml2', I got the following error: 
> > > > 
> > > >    > install.packages("libxml2")
> > > > 
> > > > Warning in download.packages(pkgs, destdir = tmpd, available =
> > available,
> > > :
> > > > 
> > > >          no package 'libxml2' at the repositories
> > > > 
> > > > So I want to ask how I can successfully install 'libxml2' on Linux
> > server?
> > > > 
> > > >  
> > > > 
> > > > > sessionInfo()
> > > > 
> > > > R version 2.2.0, 2005-10-06, i686-pc-linux-gnu
> > > > 
> > > >  
> > > > 
> > > > attached base packages:
> > > > 
> > > > [1] "methods"   "stats"     "graphics"  "grDevices" "utils"
> > "datasets"
> > > > 
> > > > [7] "base"
> > > > 
> > > >  
> > > > 
> > > > 3)       I found that UniGene source URL always point to 'Homo
> sapiens'
> > > data
> > > > even for the organism other than 'Homo sapiens'. Is that true?
> > > > 
> > > >  
> > > > 
> > > > Thanks for your attention!
> > > > 
> > > >  
> > > > 
> > > > Hua Weng
> > > > 
> > > > Microarray Core Facility
> > > > 
> > > > Oklahoma State University
> > > > 
> > > > Department of Biochemistry and Molecular Biology
> > > > 
> > > > 246 Noble Research Center
> > > > 
> > > > Stillwater, OK  74078
> > > > 
> > > >  
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> 
>