[BioC] [BioMart Users] biomaRt returning multiple columns out of order
laurent.gatto at gmail.com
Wed Oct 19 15:52:07 CEST 2011
Any update about the column order in biomaRt results?
I have come across the same issue, as illustrated below.
> mart = useMart("plants_mart_10","athaliana_eg_gene")
> ans <- getBM(attributes=c("tair_locus","peptide"), filter="tair_locus", value=c("AT3G18780","AT2G26300"), mart=mart, verbose=TRUE)
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
virtualSchemaName = 'default' uniqueRows = '1' count = '0'
datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
'athaliana_eg_gene'><Attribute name = 'tair_locus'/><Attribute name =
'peptide'/><Filter name = 'tair_locus' value = 'AT3G18780,AT2G26300'
> ans tair_locus
I see the same for useMart("ensembl","ensembl_gene_id") using
ensembl_gene_id or ensembl_exon_id as filters.
In these cases, datasetConfigVersion is also 0.6, if that's of any help.
R Under development (unstable) (2011-10-13 r57241)
Platform: x86_64-unknown-linux-gnu (64-bit)
 LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
 LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
 LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
 LC_PAPER=C LC_NAME=C
 LC_ADDRESS=C LC_TELEPHONE=C
 LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
 stats graphics grDevices utils datasets methods base
other attached packages:
loaded via a namespace (and not attached):
 RCurl_1.6-10 XML_3.4-3
On 30 September 2011 23:11, Richard Hayes <rdhayes at lbl.gov> wrote:
> On Fri, Sep 30, 2011 at 2:51 PM, Steffen Durinck <sdurinck at gmail.com> wrote:
>> Hi RIchard, Arek,
>> If you set verbose=TRUE in your getBM query you'll see the XML query that
>> is send to the BioMart server (see below for your example).
>> The order of the attributes in the XML query is usually the same order we
>> get the results back from the BioMart server.
>> However for your example this is not the case and there is no way for
>> biomaRt to know this (Arek correct me if this is not the case), so when we
>> add column names to the returned matrix they will be wrong when the query
>> order is not preserved in the returned result.
>> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
>> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
>> <?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
>> virtualSchemaName = 'default' uniqueRows = '1' count = '0'
>> datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
>> 'phytozome'><Attribute name = 'organism_name'/><Attribute name =
>> 'transcript_name'/><Attribute name = 'exon_chrom_start'/><Attribute name =
>> 'exon_chrom_end'/><Filter name = 'orgid' value = '167' /></Dataset></Query>
> Okay, I see that on my end as well. Is this a consequence of biomart v0.6 on
> the backend that would be alleviated by our plans to upgrade to 0.7 soon?
>> On Thu, Sep 29, 2011 at 9:08 AM, Arek Kasprzyk <arek.kasprzyk at gmail.com>wrote:
>>> Hi Richard,
>>> the best person to help you is Steffen Durinck, the original biomaRt coder
>>> (cc'ed on this email)
>>> On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <rdhayes at lbl.gov> wrote:
>>>> Our group maintains the biomart instance at the Phytozome plant genomics
>>>> portal. We've had some users report problems with the result sets from the
>>>> biomaRt interface. It is unclear if this is a biomaRt problem or a problem
>>>> in our mart configuration. At the moment, we are still running biomart
>>>> version 0.6, but are hoping to upgrade in the very near future to 0.7.
>>>> I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to
>>>> R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software
>>>> I can successfully connect to our mart and the main genome transcript
>>>> dataset as follows, successfully retrieving a single column of transcript
>>>> names for Arabidopsis thaliana using our internal "orgid" filter for
>>>> organism ID 167:
>>>> > library('biomaRt')
>>>> > phyto=useMart('phytozome_mart', dataset='phytozome')
>>>> > transcripts = getBM(attributes = c("transcript_name"), filters=
>>>> "orgid", values="167", mart=phyto)
>>>> > transcripts[1:5,]
>>>>  "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1"
>>>> However, when I construct a multicolumn query, the columns are not
>>>> returned in the expected order:
>>>> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
>>>> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
>>>> > multiTest[1:5,]
>>>> organism_name transcript_name exon_chrom_start exon_chrom_end
>>>> 1 AT5G47220.1 19171862 19172823 Athaliana
>>>> 2 AT1G71920.3 27067059 27067098 Athaliana
>>>> 3 AT1G71920.3 27067189 27067401 Athaliana
>>>> 4 AT1G71920.3 27067506 27067589 Athaliana
>>>> 5 AT1G71920.3 27067706 27067860 Athaliana
>>>> Any help diagnosing the source of this problem is much appreciated.
>>>> Best regards,
>>>> Richard D. Hayes, Ph.D.
>>>> Joint Genome Institute / Lawrence Berkeley National Lab
>>>> Users mailing list
>>>> Users at biomart.org
> Richard D. Hayes, Ph.D.
> Joint Genome Institute / Lawrence Berkeley National Lab
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
[ Laurent Gatto | slashhome.be ]
More information about the Bioconductor