[BioC] biomaRt getBM errors

Nathaniel Street nathaniel.street at plantphys.umu.se
Thu Nov 29 14:33:57 CET 2007


Hi

I am trying to use biomaRt to automate the retrieval of information for
Arabidopsis thaliana (my ultimate aim is actually to annotate poplar
gene models based on arabidopsis best-BLAST results). I want to be able
to extract GO information and to then construct an annotation package to
enable me to use GOstats and other Bioconductor packages.

Is AnnBuilder still the best option for constructing annotation
packages? Has anyone come across worked example of using biomaRt to
retrieve data and then using this data to make an annotation package?

Here's the script I am running

library(biomaRt)
gramene<-useMart('ENSEMBL_MART_ENSEMBL')
athmart<-useDataset("athaliana_gene_ensembl", mart = gramene)
baseUrlAT<-"ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes"
baseAT<-read.table(baseUrlAT, sep = "\t", as.is = TRUE, fill=TRUE, quote
= "", comment.char = "", header=T)
ath<-baseAT[,2]
ath<-unique(ath)
go<-getBM(attributes=c("tair_locus_model", "ptrichocarpa_ensembl_gene",
"go"), values=ath, filters="tair_locus_model", mart=athmart)

#1st attribute there because it's not returned by default

Running this, I get the error message

Error: ncol(result) == length(attributes) is not TRUE

If I run the getBM function for individual instances in ath and only
retrieve the attribute "tair_locus_model" this always works (I have
tried a large number of AGI codes from inside ath randomly ) but even
running getBM to only retrieve "tair_locus_model" for all instances of
ath fails (it returns only 2 results even though there are >40,000
entries in ath) and running getBM on individual instances of ath but for
all attributes I want to return also fails with the same error message 
as above.

I'm not sure if this is a problem with my code, a biomaRt issue or an
issue specific to the use of Gramene.

Any help much appreciated.

Thanks

Nathaniel Street

SessionInfo

R version 2.6.0 (2007-10-03)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
   base

other attached packages:
  [1] AnnBuilder_1.16.0   annotate_1.16.1     xtable_1.5-2
AnnotationDbi_1.0.6 RSQLite_0.6-4       DBI_0.2-4           XML_1.93-2.1

  [8] Biobase_1.16.1      biomaRt_1.12.2      RCurl_0.8-1


-- 
Nathaniel Street
Umeå Plant Science Centre
Department of Plant Physiology
University of Umeå
SE-901 87 Umeå
SWEDEN

email: nathaniel.street at plantphys.umu.se
tel: +46-90-786 5477
fax:  +46-90-786 6676
www.upsc.se
http://www.citeulike.org/user/natstreet



More information about the Bioconductor mailing list