[Bioc-devel] biomaRt::getBM column names
Martin Morgan
mtmorgan at fhcrc.org
Wed Jun 12 02:43:49 CEST 2013
On 06/07/2013 09:39 PM, Steffen Durinck wrote:
> Hi Martin,
>
> The original behaviour is offered through bmHeader = FALSE in the getBM query.
> Below is the long story why this change came about (it would be good to hear
> which solution is preferred by others):
Hi Steffen -- thanks for the response. I saw the bmHeader flag but the
documentation made it sound like something I'd use if the request failed
TRUE. This should only be switched off if the default
behavior results in errors, setting to off might still be
able to retrieve your data in that case
but from the description below it sounds like it is appropriate and safe for
within-database queries when listAttributes() shows that there is a one-to-one
relationship between the 'name' attributes used in the query and the
corresponding 'description' of the attributes.
Martin
>
> In most cases getBM returns the result in the order of the attributes in the
> input query. So what getBM used to do is make the attributes vector the column
> names of the query result. This return order is however not preserved in
> instances where one does a query over multiple datasets e.g. mouse and human.
> In that case one can not predict the order of the result and this would make
> the column names not match the actually returned fields. So there was a push
> that getBM uses the header information provided by the BioMart service which is
> available upon request. This ensures that the column names are always correct.
> The downside though is that the column names returned by the BioMart service
> are not the attribute name but it's description so instead of a column name
> 'affy_hg_u95av2' we get 'Affy HG U95AV2 probeset'. To keep the column naming
> as it used to be, I then would map the attribute description back to the
> attribute name and then use the corresponding attribute name as column name for
> the query result. This worked until I discovered that the attribute
> descriptions are not unique, so there is no one to one mapping from a
> description to a attribute name and this made the getBM code crash. I then
> decided that the best thing to do is by default to use the headers provided by
> the BioMart service to ensure queries never crash due to problems on the R side.
> And to enable attribute naming as it originally was done I added the
> bmHeader=FALSE option. This will be correct in most uses except for queries
> across multple datasets.
>
> Best,
> Steffen
>
>
>
> On Fri, Jun 7, 2013 at 5:31 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> Hi Steffen --
>
> getBM now returns the 'description' rather than 'name' of biomaRt columns, e.g.,
>
> mart <- useMart("ensembl")
> datasets <- listDatasets(mart)
> mart<-useDataset("hsapiens___gene_ensembl",mart)
> df <- getBM(attributes=c("affy_hg___u95av2", "hgnc_symbol",
> "chromosome_name" , "band"),
>
> filters="affy_hg_u95av2",__values=c("1939_at","1503_at","__1454_at"),,
> mart=mart)
>
> returns
>
> > df ## devel
> Affy HG U95AV2 probeset HGNC symbol Chromosome Name Band
> 1 1939_at TP53 17 p13.1
> 2 1503_at BRCA2 13 q13.1
> 3 1454_at SMAD3 15 q22.33
>
> rather than
>
> > df ## release
> affy_hg_u95av2 hgnc_symbol chromosome_name band
> 1 1939_at TP53 17 p13.1
> 2 1503_at BRCA2 13 q13.1
> 3 1454_at SMAD3 15 q22.33
>
> This makes it difficult to access columns via df$... (breaking code in at
> least a couple of packages) and it is a little confusing to ask for
> 'affy_hg_u95av2' but get 'Affy HG U95AV2 probeset'. I wonder if the original
> behaviour could be offered, either as an option or as a similarly named
> function, or (my preference) the new behavior could be provided by something
> like getBiomart() -- fancy function name for fancy column names?
>
> Martin
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel
mailing list