[BioC] biomaRt issues

Leonardo Collado Torres lcollado at lcg.unam.mx
Tue Sep 8 07:43:08 CEST 2009


Hello BioC users :)

I'm having some trouble with biomaRt with the uniprot database.

#I can execute the following code and everything works fine (with ENSEMBL):
library(biomaRt)
bsub <- useMart( "bacterial_mart_54", dataset = "bac_6_gene")
res <- getBM( attributes=c("start_position", "end_position", "strand", 
"status"), filters= c("start", "end"), values = list("1", "100000"), 
mart = bsub)
library(lattice)
print(xyplot(end_position~start_position | status, group=strand, 
data=res, auto.key=TRUE))

#But then, if I want to retrieve the EC numbers and organism info for 
the viral proteins on Uniprot, this should work:
# (I did it first through http://www.ebi.ac.uk/uniprot/biomart/martview 
and it worked)
library(biomaRt)
uni <- useMart("uniprot_mart", dataset="UNIPROT")
virus <- getBM(attributes = c("ec_number","organism"), filters = 
"superregnum_name", values = "Viruses", mart = uni)
dim(virus)
[1] 0 2
# But the virus object has 0 rows. The same happens if I use 
checkFilters = FALSE
# Using the website app, I do get information back.
# If I check only the "organism" attribute, then I do get some information.
virus2 <- getBM(attributes = c("organism"), filters = 
"superregnum_name", values = "Viruses", mart = uni)
dim(virus2)
[1] 5063    1
# However, I re did the "virus2" object a few minutes later and got a 
different result (I checked around 4 times and got the same numbers):
virus2 <- getBM(attributes = c("organism"), filters = 
"superregnum_name", values = "Viruses", mart=uni)
dim(virus2)
[1] 158   1
# Then once more after I typed the above lines on this mail, and I got 
the same original result
virus2 <- getBM(attributes = c("organism"), filters = 
"superregnum_name", values = "Viruses", mart=uni)
dim(virus2)
[1] 5063    1
# I'm pretty sure that I didn't lose my internet connection on the 
meantime, so I don't really know what is causing this error.
# I then tried the same lines on a different machine (different network 
too) and at first I got the same 5063 row value, and then I got:
virus2 <- getBM(attributes = c("organism"), filters = 
"superregnum_name", values = "Viruses", mart=uni)
dim(virus2)
[1] 8431    1
# Then 5063 again, etc.

In the end, 5063 seems to pop up more frequently, but is it the actual 
result? Is there a way to make sure I'm not missing information without 
calling getBM multiple times to check that there are no unexpected results?
I had assigned some homework exercises using biomaRt to access Uniprot, 
but now I'm confused myself about what's going on :P
Any tips will be great :) Thanks!

Leonardo


# First comp session info
sessionInfo()
R version 2.10.0 Under development (unstable) (2009-07-21 r48968)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252  
[3] LC_MONETARY=English_United States.1252 
LC_NUMERIC=C                         
[5] LC_TIME=English_United States.1252   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] lattice_0.17-25 biomaRt_2.1.0 

loaded via a namespace (and not attached):
[1] grid_2.10.0  RCurl_0.98-1 XML_2.5-1  

# Second comp session info
sessionInfo()
R version 2.10.0 Under development (unstable) (2009-08-10 r49131)
sparc-sun-solaris2.9

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.1.0

loaded via a namespace (and not attached):
[1] RCurl_1.2-0 XML_2.6-0

-- 
Leonardo Collado Torres, Bachelor in Genomic Sciences
Professor at LCG and member of Dr. Enrique Morett's lab
UNAM Campus Cuernavaca, Mexico

Homepage: http://www.lcg.unam.mx/~lcollado/
Phone: [52] (777) 313-28-05



More information about the Bioconductor mailing list