[BioC] getBM failing on complete probe list for hgu133plus2

Wolfgang Huber huber at ebi.ac.uk
Thu Jun 1 14:40:11 CEST 2006


Dear Obi,

thanks for the bug report. This problem is specific to the output="list"
option of getBM. Please try the following code, which avoids the bug and
works for me, using biomaRt from the bioc 1.8 release (note that the
script ran for 5 min wall clock time in my case):

##########################################################

library("biomaRt")
library("hgu133plus2")
probeids = ls(hgu133plus2ACCNUM )


mart = useMart("ensembl", "hsapiens_gene_ensembl")

print(system.time({

annotations=getBM(
  attributes=c("affy_hg_u133_plus_2", "ensembl_peptide_id","entrezgene",
               "unified_uniprot_accession", "uniprot_swissprot_accession"),
  filter="affy_hg_u133_plus_2",
  values=probeids, mart=mart,
  na.value="NA")

}))

print(str(annotations))

##########################################################


> source("test.R")
Loading required package: XML
Loading required package: RCurl
Checking attributes and filters ... ok
[1]  27.934   0.516 319.800   0.000   0.000
`data.frame':   252222 obs. of  5 variables:
 $ affy_hg_u133_plus_2        : chr  "232806_s_at" "221904_at"
"232806_s_at" "23 2807_at" ...
 $ ensembl_peptide_id         : chr  "ENSP00000340974" "ENSP00000340974"
"ENSP00 000373360" "ENSP00000373360" ...
 $ entrezgene                 : int  NA NA NA NA 131408 131408 10752
10752 10752  27255 ...
 $ unified_uniprot_accession  : chr  "Q8TA84" "Q8TA84" "Q6ZTF8" "Q6ZTF8" ...
 $ uniprot_swissprot_accession: chr  "" "" "" "" ...
NULL
> length(unique(annotations$unified_uniprot_accession))
[1] 42193
> length(unique(annotations$affy_hg_u133_plus_2))
[1] 25188

#############################################################

Best wishes
 Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

> Dear BioC list,
> 
> I'm getting some very strange behaviour from biomaRt.  The following script works perfectly if I supply a single probe id to the getBM function (i.e. values=probe_ids[1]).  But, when I supply the entire probe_ids list I get the following error.
> 
> Error in postForm(paste(mart at host, "?", sep = ""), query = xmlQuery) :
>         <not set>
> 
> Once I get the above error, I get a whole bunch of new errors for commands that would have worked before the error.
> examples:
>> annotations=getBM(attributes=c("affy_hg_u133_plus_2", "ensembl_peptide_id","entrezgene", "unified_uniprot_accession", "uniprot_swissprot_accession"), filter="affy_hg_u133_plus_2", values=sample_gene, mart=mart, output="list", na.value="NA")
> Error in postForm(paste(mart at host, "?", sep = ""), query = xmlQuery) :
>         Couldn't resolve host 'www.biomart.org'
> 
> It seems to kill my connection to biomart until I quit R and start all over again.  Can anyone help?  Is there some kind of timeout for such a large query?  Has anyone gotten this sort of thing to work before or can you suggest an alternative way to map all affy probe ids to uniprot IDs in R?
> 
> ####Start R script#####
> #Load the appropriate libraries
> library(affy)
> library(gcrma)
> library("annotate")
> library("biomaRt")
> library("hgu133plus2")
> 
> setwd("/home/my_dir")
> 
> #just.gcrma method
> #Get file list
> celfiles=list.files(path="/home/my_dir")
> 
> #Do gcrma normalization
> gcrma_exprset=just.gcrma(filenames=celfiles,normalize=TRUE,type="fullmodel",verbose=TRUE,fast=FALSE,optimize.by="memory")
> 
> probe_ids=geneNames(gcrma_exprset)
> 
> mart <- useMart("ensembl", "hsapiens_gene_ensembl")
> 
> annotations=getBM(attributes=c("affy_hg_u133_plus_2", "ensembl_peptide_id","entrezgene", "unified_uniprot_accession", "uniprot_swissprot_accession"), filter="affy_hg_u133_plus_2", values=probe_ids_nc, mart=mart, output="list", na.value="NA")
> ####End R script#####



More information about the Bioconductor mailing list