[BioC] biomaRt issues
Wolfgang Huber
whuber at embl.de
Tue Sep 8 16:57:27 CEST 2009
Dear Leonardo
Thank you for your clear and helpful problem report!
The lack of returned results (in one case), and the irreproducibility of
returned results (in another case) seem to be a problem of the UniProt
mart rather than of the biomaRt package per se. I cc Jie Luo at EBI who,
afaIu, would be the most appropriate person to respond here, and perhaps
help to localise and eliminate the problem.
Best wishes
Wolfgang
Leonardo Collado Torres ha scritto:
> Hello BioC users :)
>
> I'm having some trouble with biomaRt with the uniprot database.
>
> #I can execute the following code and everything works fine (with ENSEMBL):
> library(biomaRt)
> bsub <- useMart( "bacterial_mart_54", dataset = "bac_6_gene")
> res <- getBM( attributes=c("start_position", "end_position", "strand",
> "status"), filters= c("start", "end"), values = list("1", "100000"),
> mart = bsub)
> library(lattice)
> print(xyplot(end_position~start_position | status, group=strand,
> data=res, auto.key=TRUE))
>
> #But then, if I want to retrieve the EC numbers and organism info for
> the viral proteins on Uniprot, this should work:
> # (I did it first through http://www.ebi.ac.uk/uniprot/biomart/martview
> and it worked)
> library(biomaRt)
> uni <- useMart("uniprot_mart", dataset="UNIPROT")
> virus <- getBM(attributes = c("ec_number","organism"), filters =
> "superregnum_name", values = "Viruses", mart = uni)
> dim(virus)
> [1] 0 2
> # But the virus object has 0 rows. The same happens if I use
> checkFilters = FALSE
> # Using the website app, I do get information back.
> # If I check only the "organism" attribute, then I do get some information.
> virus2 <- getBM(attributes = c("organism"), filters =
> "superregnum_name", values = "Viruses", mart = uni)
> dim(virus2)
> [1] 5063 1
> # However, I re did the "virus2" object a few minutes later and got a
> different result (I checked around 4 times and got the same numbers):
> virus2 <- getBM(attributes = c("organism"), filters =
> "superregnum_name", values = "Viruses", mart=uni)
> dim(virus2)
> [1] 158 1
> # Then once more after I typed the above lines on this mail, and I got
> the same original result
> virus2 <- getBM(attributes = c("organism"), filters =
> "superregnum_name", values = "Viruses", mart=uni)
> dim(virus2)
> [1] 5063 1
> # I'm pretty sure that I didn't lose my internet connection on the
> meantime, so I don't really know what is causing this error.
> # I then tried the same lines on a different machine (different network
> too) and at first I got the same 5063 row value, and then I got:
> virus2 <- getBM(attributes = c("organism"), filters =
> "superregnum_name", values = "Viruses", mart=uni)
> dim(virus2)
> [1] 8431 1
> # Then 5063 again, etc.
>
> In the end, 5063 seems to pop up more frequently, but is it the actual
> result? Is there a way to make sure I'm not missing information without
> calling getBM multiple times to check that there are no unexpected results?
> I had assigned some homework exercises using biomaRt to access Uniprot,
> but now I'm confused myself about what's going on :P
> Any tips will be great :) Thanks!
>
> Leonardo
>
>
> # First comp session info
> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-07-21 r48968)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 [3] LC_MONETARY=English_United States.1252
> LC_NUMERIC=C [5] LC_TIME=English_United
> States.1252
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] lattice_0.17-25 biomaRt_2.1.0
> loaded via a namespace (and not attached):
> [1] grid_2.10.0 RCurl_0.98-1 XML_2.5-1
> # Second comp session info
> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-08-10 r49131)
> sparc-sun-solaris2.9
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_2.1.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.2-0 XML_2.6-0
>
--
Best wishes
Wolfgang
-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
More information about the Bioconductor
mailing list