[BioC] possible bug in getBM{biomaRt}
Wolfgang Huber
huber at ebi.ac.uk
Mon Apr 6 18:36:47 CEST 2009
Dear Rhoda & Teresa
thanks! That clarifies - the two query results are linked by
"ensembl_gene_id". There is no need to open two different connections in
biomaRt, so example code including merging and some clean up could look
like:
library("biomaRt")
mart = useMart("ensembl")
ensembl = useDataset("hsapiens_gene_ensembl", mart = mart)
p = c("200641_s_at" , "229411_at" , "223376_s_at")
Q1 = getBM(attributes =
c("affy_hg_u133_plus_2","entrezgene","ensembl_gene_id"),
filters = "affy_hg_u133_plus_2", mart=ensembl, values=p)
Q2 = getBM(attributes=c("validated","ensembl_gene_id"),
filters="affy_hg_u133_plus_2", mart=ensembl, values=p)
## clean up:
## Q1 = subset(Q1, !is.na(entrezgene))
Q = merge(Q1,Q2)
Best wishes
Wolfgang
----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
Rhoda Kinsella wrote:
> Hi Wolfgang and Teresa,
> In the query below, Q1 and Q2 are still accessing the same mart
> (mart=ensembl) so this will try to get the attributes from multiple
> pages of the same mart. You will need to set up two distinct mart
> datasets (mart1 and mart2) and as Ensembl is gene centric, you will also
> have to include something for the two datasets to link on (e.g. Ensembl
> gene ID or Ensembl transcript ID as these are present on the two pages
> that you are trying to access). I think you need to do something like this:
>
> > library(biomaRt)
> > mart1 = useMart("ensembl")
> > mart2 = useMart("ensembl")
> > ensembl1 = useDataset("hsapiens_gene_ensembl",mart=mart1);
> Checking attributes and filters ... ok
> > ensembl2 = useDataset("hsapiens_gene_ensembl",mart=mart2);
> Checking attributes and filters ... ok
> > getBM(attributes=c("affy_hg_u133_plus_2","entrezgene",
> "ensembl_gene_id"),
> filters="affy_hg_u133_plus_2",mart=ensembl1,values="205953_at");
>
> affy_hg_u133_plus_2 entrezgene ensembl_gene_id
> 1 205953_at 9860 ENSG00000198799
>
> > getBM(attributes=c("validated",
> "ensembl_gene_id"),filters="affy_hg_u133_plus_2",mart=ensembl2,
> values="205953_at");
> validated ensembl_gene_id
> 1 ENSG00000198799
> 2 hapmap ENSG00000198799
> 3 cluster,freq,submitter,doublehit,hapmap ENSG00000198799
> 4 freq,hapmap ENSG00000198799
> 5 cluster,freq,submitter,hapmap ENSG00000198799
> 6 freq,doublehit,hapmap ENSG00000198799
> 7 cluster,doublehit ENSG00000198799
> 8 cluster ENSG00000198799
> 9 cluster,freq,hapmap ENSG00000198799
> 10 cluster,freq,doublehit,hapmap ENSG00000198799
> 11 cluster,freq,doublehit ENSG00000198799
> 12 freq ENSG00000198799
> 13 doublehit ENSG00000198799
> 14 cluster,hapmap ENSG00000198799
> 15 freq,submitter,hapmap ENSG00000198799
>
> I am sure that there is a more elegant way of doing the linking using
> biomaRt. I hope this helps but if anything is not clear, please don't
> hesitate to get in touch.
> Regards,
> Rhoda
>
>
>> Dear Rhoda, Teresa
>>
>> if I try that suggestion, by:
>>
>>
>> library("biomaRt")
>> mart = useMart("ensembl")
>> ensembl = useDataset("hsapiens_gene_ensembl", mart = mart);
>>
>> library("hgu133plus2.db")
>> EID = toTable(hgu133plus2ENTREZID)
>> set.seed(0xbadbeef)
>> I = sample(nrow(EID), 100)
>> p = EID[I,"probe_id"]
>>
>> Q1=getBM(attributes=c("affy_hg_u133_plus_2","entrezgene"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p)
>> Q2=getBM(attributes=c("affy_hg_u133_plus_2","validated"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p)
>>
>>
>> I get:
>>
>> Error in getBM(attributes = c("affy_hg_u133_plus_2", "validated"),
>> filters = "affy_hg_u133_plus_2", :
>> Query ERROR: caught BioMart::Exception::Usage: Attributes from
>> multiple attribute pages are not allowed
>>
>>
>> (Note that this is the same error that Teresa encountered, just that
>> in more recent versions of biomaRt the error message is more informative.)
>>
>> OTOH, if I only do
>>
>> Q3=getBM(attributes=c("validated"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p)
>>
>> this returns successfully, but then Q3 has 27 rows and there seems no
>> way to find out to which of the filter values the rows belong to -
>> i.e., it is useless.
>>
>> So what's the best way to proceed when one wants to query across
>> multiple pages like in Teresa's example?
>>
>> Best wishes
>> Wolfgang
>>
>> ----------------------------------------------------
>> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
>>
>>
>> > sessionInfo()
>> R version 2.10.0 Under development (unstable) (2009-04-02 r48271)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=la_AU.UTF-8;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices datasets utils methods base
>>
>> other attached packages:
>> [1] hgu133plus2.db_2.2.11 RSQLite_0.7-1 DBI_0.2-4
>> [4] AnnotationDbi_1.5.23 Biobase_2.3.11 biomaRt_1.99.8
>> [7] fortunes_1.3-6
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_0.94-1 XML_2.3-0
>>
>>
>>
>> Rhoda Kinsella wrote:
>>> Hi Teresa,
>>> The Ensembl mart attributes are set up as 4 separate categories or
>>> "pages". If you take a look at the martview interface you will see
>>> that these categories are: Features, Homologs, Structures, Sequences
>>> and Variations. At present, it is not possible to mix attributes from
>>> multiple sections as you will get the error message you received
>>> (i.e. in your query the "validated" attribute came from the
>>> "Variations" section and the rest of the attributes came from the
>>> "Features" section). The way around this is to perform two separate
>>> queries; one to select the features attributes and one to retrieve
>>> the variations attribute. You may also be able to link to two
>>> separate datasets, one for the validated part of the query and the
>>> other for the features part of the query and pull out all the
>>> information you need. I'm not sure how this is done using biomaRt,
>>> but perhaps someone else from the mailing list can help you to do this.
>>> I hope that helps,
>>> Regards,
>>> Rhoda
>>> On 3 Apr 2009, at 15:09, Teresa Colombo wrote:
>>>> Dear list,
>>>>
>>>> it looks like there may be a bug in function 'getBM' affecting the
>>>> use of
>>>> attribute 'validated':
>>>>
>>>> --------------------------------------------------------------------------------------------------------------------
>>>>
>>>>> library(biomaRt)
>>>>> mart = useMart("ensembl")
>>>>> ensembl = useDataset("hsapiens_gene_ensembl", mart = mart);
>>>>
>>>>> library(hgu133plus2.db);
>>>>> EID <- toTable(hgu133plus2ENTREZID);
>>>>> I <- sample(dim(EID)[1],100)
>>>>> p <- EID[I,"probe_id"];
>>>>> head(p)
>>>> [1] "205953_at" "214718_at" "233633_at" "241572_at" "224704_at"
>>>> [6] "221510_s_at"
>>>>
>>>>
>>>>> Q <-
>>>> getBM(attributes=c("affy_hg_u133_plus_2","entrezgene","validated"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p);
>>>>
>>>>
>>>> V1
>>>> 1 Query ERROR: caught BioMart::Exception::Usage: Attributes from
>>>> multiple
>>>> attribute pages are not allowed
>>>> Errore in getBM(attributes = c("affy_hg_u133_plus_2", "entrezgene",
>>>> "validated"), :
>>>> Number of columns in the query result doesn't equal number of attributes
>>>> in query. This is probably an internal error, please report.
>>>>
>>>>
>>>> --------------------------------------------------------------------------------------------------------------------
>>>>
>>>> The above error message disappears when running the same query after
>>>> having
>>>> removed "validated" from the list of attributes.
>>>> --------------------------------------------------------------------------------------------------------------------
>>>>
>>>>> Q <-
>>>> getBM(attributes=c("affy_hg_u133_plus_2","entrezgene"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p);
>>>>
>>>>> head(Q)
>>>> affy_hg_u133_plus_2 entrezgene
>>>> 1 1552538_a_at NA
>>>> 2 1552538_a_at 221458
>>>> 3 1554485_s_at 140738
>>>> 4 1555097_a_at 5737
>>>> 5 1564015_at NA
>>>> 6 1564198_a_at 118611
>>>>
>>>>
>>>>> sessionInfo()
>>>> R version 2.8.1 (2008-12-22)
>>>> i486-pc-linux-gnu
>>>>
>>>> locale:
>>>> LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
>>>>
>>>>
>>>> attached base packages:
>>>> [1] tools stats graphics grDevices utils datasets methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] hgu133plus2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4
>>>> [4] AnnotationDbi_1.4.3 Biobase_2.2.2 biomaRt_1.16.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] RCurl_0.94-1 XML_2.1-0
>>>> --------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>> Best wishes
>>>> teresa
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch <mailto:Bioconductor at stat.math.ethz.ch>
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> Rhoda Kinsella Ph.D.
>>> Ensembl Bioinformatician,
>>> European Bioinformatics Institute (EMBL-EBI),
>>> Wellcome Trust Genome Campus,
>>> Hinxton
>>> Cambridge CB10 1SD,
>>> UK.
>
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>
More information about the Bioconductor
mailing list