[BioC] biomaRt getBM errors
James W. MacDonald
jmacdon at med.umich.edu
Thu Nov 29 15:38:48 CET 2007
Hi Nathaniel,
Nathaniel Street wrote:
> Hi
>
> I am trying to use biomaRt to automate the retrieval of information for
> Arabidopsis thaliana (my ultimate aim is actually to annotate poplar
> gene models based on arabidopsis best-BLAST results). I want to be able
> to extract GO information and to then construct an annotation package to
> enable me to use GOstats and other Bioconductor packages.
>
> Is AnnBuilder still the best option for constructing annotation
> packages? Has anyone come across worked example of using biomaRt to
> retrieve data and then using this data to make an annotation package?
Seems like a difficult way to go about things -- biomaRt is intended for
more or less interactive annotation of things rather than simply getting
all annotations. Wouldn't it be easier to just download a database dump
from TAIR (or wherever one would get Arabidopsis info)?
>
> Here's the script I am running
>
> library(biomaRt)
> gramene<-useMart('ENSEMBL_MART_ENSEMBL')
> athmart<-useDataset("athaliana_gene_ensembl", mart = gramene)
> baseUrlAT<-"ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes"
> baseAT<-read.table(baseUrlAT, sep = "\t", as.is = TRUE, fill=TRUE, quote
> = "", comment.char = "", header=T)
> ath<-baseAT[,2]
> ath<-unique(ath)
> go<-getBM(attributes=c("tair_locus_model", "ptrichocarpa_ensembl_gene",
> "go"), values=ath, filters="tair_locus_model", mart=athmart)
>
> #1st attribute there because it's not returned by default
>
> Running this, I get the error message
>
> Error: ncol(result) == length(attributes) is not TRUE
Two things:
Using read.table is probably not the way you want to go in this
instance. Something like
baseAT <- scan(baseUrlAT, what=list("","",0,""), sep="\t", skip=1)
would be much more efficient.
Ideally you would want to use the mysql interface for downloading lots
of information, but it appears this mart doesn't support mysql
(dangit!). Anyway, if you output as a list things seem to work. I don't
know if this is expected or a bug (Steffen Durinck will likely chime in
if it is unexpected). Additionally, when you output in a list you don't
need to output the filter as an attribute as well because there is no
problem of multiple lines per attribute.
> mart <- useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl",
mysql=T)
Loading required package: RMySQL
Loading required package: DBI
Error in useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl", mysql
= T) :
Requested BioMart database is not available please use the function
listMarts(mysql=TRUE) to see the valid biomart names you can query using
mysql access
> mart <- useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl")
Checking attributes and filters ... ok
> getBM(c("tair_locus_model","ptrichocarpa_ensembl_gene","go"),
"tair_locus_model", bst[[2]], mart, output="list")
$tair_locus_model
$tair_locus_model$AT1G01010.1
[1] "AT1G01010.1"
$tair_locus_model$AT1G01020.1
[1] "AT1G01020.1"
$tair_locus_model$AT1G01020.2
[1] "AT1G01020.2"
$tair_locus_model$AT1G01030.1
[1] "AT1G01030.1"
$tair_locus_model$AT1G01040.1
[1] "AT1G01040.1"
$tair_locus_model$DCL1
[1] NA
$tair_locus_model$AT1G01050.1
[1] "AT1G01050.1"
$tair_locus_model$AT1G01060.1
[1] "AT1G01060.1"
$tair_locus_model$AT1G01060.2
[1] "AT1G01060.2"
$tair_locus_model$AT1G01060.3
[1] "AT1G01060.3"
$ptrichocarpa_ensembl_gene
$ptrichocarpa_ensembl_gene$AT1G01010.1
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01020.1
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01020.2
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01030.1
[1] "gw1.XIV.1973.1"
$ptrichocarpa_ensembl_gene$AT1G01040.1
[1] "eugene3.00021687"
$ptrichocarpa_ensembl_gene$DCL1
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01050.1
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01060.1
[1] "estExt_Genewise1_v1.C_LG_XIV1950"
$ptrichocarpa_ensembl_gene$AT1G01060.2
[1] "estExt_Genewise1_v1.C_LG_XIV1950"
$ptrichocarpa_ensembl_gene$AT1G01060.3
[1] "estExt_Genewise1_v1.C_LG_XIV1950"
$go
$go$AT1G01010.1
[1] "GO:0007275" "GO:0003700" "GO:0005575"
$go$AT1G01020.1
[1] "GO:0008150" "GO:0003674" "GO:0016020"
$go$AT1G01020.2
[1] "GO:0008150" "GO:0003674" "GO:0016020"
$go$AT1G01030.1
[1] "GO:0003700" "GO:0005575" "GO:0009908"
[4] "GO:0045449" "GO:0048366"
$go$AT1G01040.1
[1] NA
$go$DCL1
[1] NA
$go$AT1G01050.1
[1] "GO:0008152" "GO:0016462" "GO:0004427"
[4] "GO:0005634" "GO:0016020" "GO:0005737"
$go$AT1G01060.1
[1] "GO:0003700"
$go$AT1G01060.2
[1] "GO:0003700"
$go$AT1G01060.3
[1] "GO:0003700"
> saveHistory()
Error: could not find function "saveHistory"
> apropos("save")
[1] ".__M__saveHTML:annaffy"
[2] ".__M__saveText:annaffy"
[3] ".__T__saveHTML:annaffy"
[4] ".__T__saveText:annaffy"
[5] ".saveRDS"
[6] "save"
[7] "save.image"
[8] "savehistory"
[9] "saveHTML"
[10] "saveNamespaceImage"
[11] "savePlot"
[12] "saveText"
[13] "sys.save.image"
> savehistory()
> mart <- useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl")
Checking attributes and filters ... ok
> bs <- "ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes"
> bst <- scan(bs, sep="\t", what=list("","",0,""), skip=1, nlines=10)
Read 10 records
> getBM(c("ptrichocarpa_ensembl_gene","go"), "tair_locus_model",
bst[[2]], mart, output="list")
$ptrichocarpa_ensembl_gene
$ptrichocarpa_ensembl_gene$AT1G01010.1
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01020.1
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01020.2
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01030.1
[1] "gw1.XIV.1973.1"
$ptrichocarpa_ensembl_gene$AT1G01040.1
[1] "eugene3.00021687"
$ptrichocarpa_ensembl_gene$DCL1
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01050.1
[1] NA
$ptrichocarpa_ensembl_gene$AT1G01060.1
[1] "estExt_Genewise1_v1.C_LG_XIV1950"
$ptrichocarpa_ensembl_gene$AT1G01060.2
[1] "estExt_Genewise1_v1.C_LG_XIV1950"
$ptrichocarpa_ensembl_gene$AT1G01060.3
[1] "estExt_Genewise1_v1.C_LG_XIV1950"
$go
$go$AT1G01010.1
[1] "GO:0007275" "GO:0003700" "GO:0005575"
$go$AT1G01020.1
[1] "GO:0008150" "GO:0003674" "GO:0016020"
$go$AT1G01020.2
[1] "GO:0008150" "GO:0003674" "GO:0016020"
$go$AT1G01030.1
[1] "GO:0003700" "GO:0005575" "GO:0009908"
[4] "GO:0045449" "GO:0048366"
$go$AT1G01040.1
[1] NA
$go$DCL1
[1] NA
$go$AT1G01050.1
[1] "GO:0008152" "GO:0016462" "GO:0004427"
[4] "GO:0005634" "GO:0016020" "GO:0005737"
$go$AT1G01060.1
[1] "GO:0003700"
$go$AT1G01060.2
[1] "GO:0003700"
$go$AT1G01060.3
[1] "GO:0003700"
Best,
Jim
>
> If I run the getBM function for individual instances in ath and only
> retrieve the attribute "tair_locus_model" this always works (I have
> tried a large number of AGI codes from inside ath randomly ) but even
> running getBM to only retrieve "tair_locus_model" for all instances of
> ath fails (it returns only 2 results even though there are >40,000
> entries in ath) and running getBM on individual instances of ath but for
> all attributes I want to return also fails with the same error message
> as above.
>
> I'm not sure if this is a problem with my code, a biomaRt issue or an
> issue specific to the use of Gramene.
>
> Any help much appreciated.
>
> Thanks
>
> Nathaniel Street
>
> SessionInfo
>
> R version 2.6.0 (2007-10-03)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] tools stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] AnnBuilder_1.16.0 annotate_1.16.1 xtable_1.5-2
> AnnotationDbi_1.0.6 RSQLite_0.6-4 DBI_0.2-4 XML_1.93-2.1
>
> [8] Biobase_1.16.1 biomaRt_1.12.2 RCurl_0.8-1
>
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list