[Bioc-devel] biomaRt::getBM column names

Martin Morgan mtmorgan at fhcrc.org
Wed Jun 12 02:43:49 CEST 2013


On 06/07/2013 09:39 PM, Steffen Durinck wrote:
> Hi Martin,
>
> The original behaviour is offered through bmHeader = FALSE in the getBM query.
> Below is the long story why this change came about (it would be good to hear
> which solution is preferred by others):

Hi Steffen -- thanks for the response. I saw the bmHeader flag but the 
documentation made it sound like something I'd use if the request failed


           TRUE.  This should only be switched off if the default
           behavior results in errors, setting to off might still be
           able to retrieve your data in that case

but from the description below it sounds like it is appropriate and safe for 
within-database queries when listAttributes() shows that there is a one-to-one 
relationship between the 'name' attributes used in the query  and the 
corresponding 'description' of the attributes.

Martin

>
> In most cases getBM returns the result in the order of the attributes in the
> input query.  So what getBM used to do is make the attributes vector the column
> names of the query result.  This return order is however not preserved in
> instances where one does a query over multiple datasets e.g. mouse and human.
>   In that case one can not predict the order of the result and this would make
> the column names not match the actually returned fields.  So there was a push
> that getBM uses the header information provided by the BioMart service which is
> available upon request.  This ensures that the column names are always correct.
>   The downside though is that the column names returned by the BioMart service
> are not the attribute name but it's description so instead of a column name
> 'affy_hg_u95av2'  we get 'Affy HG U95AV2 probeset'.  To keep the column naming
> as it used to be, I then would map the attribute description back to the
> attribute name and then use the corresponding attribute name as column name for
> the query result.  This worked until I discovered that the attribute
> descriptions are not unique, so there is no one to one mapping from a
> description to a attribute name and this made the getBM code crash.  I then
> decided that the best thing to do is by default to use the headers provided by
> the BioMart service to ensure queries never crash due to problems on the R side.
>   And to enable attribute naming as it originally was done I added the
> bmHeader=FALSE option.  This will be correct in most uses except for queries
> across multple datasets.
>
> Best,
> Steffen
>
>
>
> On Fri, Jun 7, 2013 at 5:31 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
>     Hi Steffen --
>
>     getBM now returns the 'description' rather than 'name' of biomaRt columns, e.g.,
>
>           mart <- useMart("ensembl")
>           datasets <- listDatasets(mart)
>           mart<-useDataset("hsapiens___gene_ensembl",mart)
>           df <- getBM(attributes=c("affy_hg___u95av2", "hgnc_symbol",
>                                    "chromosome_name" , "band"),
>
>     filters="affy_hg_u95av2",__values=c("1939_at","1503_at","__1454_at"),,
>                  mart=mart)
>
>     returns
>
>      > df ## devel
>        Affy HG U95AV2 probeset HGNC symbol Chromosome Name   Band
>     1                 1939_at        TP53              17  p13.1
>     2                 1503_at       BRCA2              13  q13.1
>     3                 1454_at       SMAD3              15 q22.33
>
>     rather than
>
>      > df  ## release
>        affy_hg_u95av2 hgnc_symbol chromosome_name   band
>     1        1939_at        TP53              17  p13.1
>     2        1503_at       BRCA2              13  q13.1
>     3        1454_at       SMAD3              15 q22.33
>
>     This makes it difficult to access columns via df$... (breaking code in at
>     least a couple of packages) and it is a little confusing to ask for
>     'affy_hg_u95av2' but get 'Affy HG U95AV2 probeset'. I wonder if the original
>     behaviour could be offered, either as an option or as a similarly named
>     function, or (my preference) the new behavior could be provided by something
>     like getBiomart() -- fancy function name for fancy column names?
>
>     Martin
>     --
>     Computational Biology / Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N.
>     PO Box 19024 Seattle, WA 98109
>
>     Location: Arnold Building M1 B861
>     Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list