[BioC] Fwd: biomaRt column order

steffen at stat.Berkeley.EDU steffen at stat.Berkeley.EDU
Tue Jul 15 18:50:10 CEST 2008


Looks like my reply on the biomaRt column order didn't make it to the bioc
mailing list.

---------------------------- Original Message ----------------------------
Subject: Re: Fwd: [BioC] biomaRt column order
From:    steffen at stat.berkeley.edu
Date:    Thu, July 10, 2008 9:13 pm
To:      "Mark Robinson" <mrobinson at wehi.EDU.AU>
Cc:      bioconductor at stat.math.ethz.ch
--------------------------------------------------------------------------

Hi Mark,

The main problem here is that attributes from different attribute pages
are retrieved and this is not supported by the webservice though such
queries are possible and useful especially for what we do in Bioconductor.

To get an idea what attribute pages are you could check out the BioMart
web interfaces at e.g. http://www.ensembl.org

They are there to group attributes of a similar type together and display
in one webpage ...this makes less sense for command line use like biomaRt.

The column names are returned by the webservice so this problem will have
to be solved there.  Though by using the attributes for chromosome_name
and ensembl_gene_id from the sequence attribute page the query should
return the column names correctly.

To see with biomaRt all attributes that belong to one page you could do:

listAttributes(mart, category="Sequences")

If you change your query as follows the column names should be in correct
order:

b<-getBM(c("sequence_gene_stable_id","sequence_str_chrom_name",
"sequence_biotype","sequence_exon_chrom_start","sequence_exon_chrom_end")
,filters="ensembl_gene_id",values="ENSG00000197530",mart=mart)

You'll get:

    gene_stable_id str_chrom_name struct_biotype exon_chrom_start
exon_chrom_end
1  ENSG00000197530              1 protein_coding          1540747
1540876
2  ENSG00000197530              1 protein_coding          1541751
1541857
3  ENSG00000197530              1 protein_coding          1548632
1548942
4  ENSG00000197530              1 protein_coding          1549017
1549188



Cheers,
Steffen

>
>
> Begin forwarded message:
>
>> From: Mark Robinson <mrobinson at wehi.EDU.AU>
>> Date: 5 July 2008 9:13:48 AM
>> To: bioconductor at stat.math.ethz.ch
>> Subject: [BioC] biomaRt column order
>>
>> Dear list.
>>
>> I'm using biomaRt to do a fairly simple query against the Ensembl
>> human database.  I get returned a table with column names that don't
>> match the data in the columns.  See below.
>>
>> I can reshuffle them afterwards to make them, but thats not ideal.
>>
>> Am I doing something wrong?
>>
>> Thanks,
>> Mark
>>
>>
>>
>>
>> > library(biomaRt)
>> Loading required package: RCurl
>> > mart=useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
>> Checking attributes and filters ... ok
>> > mart
>> Object of class 'Mart':
>> Using the ensembl BioMart database
>> Using the hsapiens_gene_ensembl dataset
>> > b<-
>> getBM
>> (c
>> ("ensembl_gene_id
>> ","chromosome_name
>> ","sequence_biotype
>> ","sequence_exon_chrom_start
>> ","sequence_exon_chrom_end
>> "),filters="ensembl_gene_id",values="ENSG00000197530",mart=mart)
>> > dim(b)
>> [1] 25  5
>> > b[1:10,]
>>   ensembl_gene_id chromosome_name struct_biotype exon_chrom_start
>> 1   protein_coding         1542803        1542958  ENSG00000197530
>> 2   protein_coding         1548674        1548942  ENSG00000197530
>> 3   protein_coding         1549017        1549188  ENSG00000197530
>> 4   protein_coding         1550038        1550144  ENSG00000197530
>> 5   protein_coding         1550234        1550428  ENSG00000197530
>> 6   protein_coding         1550529        1550671  ENSG00000197530
>> 7   protein_coding         1551893        1551997  ENSG00000197530
>> 8   protein_coding         1552080        1552242  ENSG00000197530
>> 9   protein_coding         1552317        1552450  ENSG00000197530
>> 10  protein_coding         1552539        1552687  ENSG00000197530
>>   exon_chrom_end
>> 1               1
>> 2               1
>> 3               1
>> 4               1
>> 5               1
>> 6               1
>> 7               1
>> 8               1
>> 9               1
>> 10              1
>> > sessionInfo()
>> R version 2.7.1 (2008-06-23)
>> i386-apple-darwin8.10.1
>>
>> locale:
>> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] biomaRt_1.14.0 RCurl_0.9-3
>>
>> loaded via a namespace (and not attached):
>> [1] XML_1.95-2
>>
>>
>>
>> ------------------------------
>> Mark Robinson
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: m.robinson at garvan.org.au
>> e: mrobinson at wehi.edu.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> ------------------------------
> Mark Robinson
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robinson at garvan.org.au
> e: mrobinson at wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
>
>
>
>



More information about the Bioconductor mailing list