[BioC] biomaRt getBM errors
Nathaniel Street
nathaniel.street at plantphys.umu.se
Thu Nov 29 14:33:57 CET 2007
Hi
I am trying to use biomaRt to automate the retrieval of information for
Arabidopsis thaliana (my ultimate aim is actually to annotate poplar
gene models based on arabidopsis best-BLAST results). I want to be able
to extract GO information and to then construct an annotation package to
enable me to use GOstats and other Bioconductor packages.
Is AnnBuilder still the best option for constructing annotation
packages? Has anyone come across worked example of using biomaRt to
retrieve data and then using this data to make an annotation package?
Here's the script I am running
library(biomaRt)
gramene<-useMart('ENSEMBL_MART_ENSEMBL')
athmart<-useDataset("athaliana_gene_ensembl", mart = gramene)
baseUrlAT<-"ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes"
baseAT<-read.table(baseUrlAT, sep = "\t", as.is = TRUE, fill=TRUE, quote
= "", comment.char = "", header=T)
ath<-baseAT[,2]
ath<-unique(ath)
go<-getBM(attributes=c("tair_locus_model", "ptrichocarpa_ensembl_gene",
"go"), values=ath, filters="tair_locus_model", mart=athmart)
#1st attribute there because it's not returned by default
Running this, I get the error message
Error: ncol(result) == length(attributes) is not TRUE
If I run the getBM function for individual instances in ath and only
retrieve the attribute "tair_locus_model" this always works (I have
tried a large number of AGI codes from inside ath randomly ) but even
running getBM to only retrieve "tair_locus_model" for all instances of
ath fails (it returns only 2 results even though there are >40,000
entries in ath) and running getBM on individual instances of ath but for
all attributes I want to return also fails with the same error message
as above.
I'm not sure if this is a problem with my code, a biomaRt issue or an
issue specific to the use of Gramene.
Any help much appreciated.
Thanks
Nathaniel Street
SessionInfo
R version 2.6.0 (2007-10-03)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
base
other attached packages:
[1] AnnBuilder_1.16.0 annotate_1.16.1 xtable_1.5-2
AnnotationDbi_1.0.6 RSQLite_0.6-4 DBI_0.2-4 XML_1.93-2.1
[8] Biobase_1.16.1 biomaRt_1.12.2 RCurl_0.8-1
--
Nathaniel Street
Umeå Plant Science Centre
Department of Plant Physiology
University of Umeå
SE-901 87 Umeå
SWEDEN
email: nathaniel.street at plantphys.umu.se
tel: +46-90-786 5477
fax: +46-90-786 6676
www.upsc.se
http://www.citeulike.org/user/natstreet
More information about the Bioconductor
mailing list