[BioC] Question about mget vs. select for annotation package
Marc Carlson
mcarlson at fhcrc.org
Tue Jul 9 13:19:12 CEST 2013
Hi Christina,
The basic problem is that the bimap interface was created in order to
emulate an even older set of environments. And for these older
platforms, people mostly were initially very uninterested in keys (probe
IDs) that mapped to multiple different things as those probes were
usually IDs from microarrays. And a probe on a microarray that maps to
multiple targets is probably just a bad probe... So software was
written with that limitation in mind, and time marched on and now if we
changed it, some of that old code might break. Later on, when people
started to use these bimaps for things other than microarrays, we kept
that multiple probe limitation for backwards compatibility, and then
provided the toggleProbes() method that Herve mentioned so that people
could get that data if they cared to.
And when we wrote the newer select() interface, the world had moved on
to where we were doing both microarray and a host of other things like
high throughput sequencing, and annotation was mostly something that
people did at the end of an analysis, and usually just to decorate a
data.frame object. So when we wrote select() we were now free to always
expose all the data for a probe or gene and just warn the user that they
might be getting back more data than was expected (when that actually
happened). So select() was really designed to be a more general
annotation tool. At this time, we are hoping that most people will use
select() which offers a simpler way to access this data. But we still
provide the older bimap interface mostly for the sake of backwards
compatibility.
Marc
On 07/02/2013 03:26 PM, Hervé Pagès wrote:
> Hi Christina,
>
> In AnnotationDbi jargon, a probe that matches multiple genes is called
> a multiple probe. When using the classic Bimap API, multiple probles are
> mapped to NA by default. Unless you use toggleProbes() on the Bimap
> object to request the full mapping:
>
> > map <- toggleProbes(hgu133plus2ENTREZID, "all")
>
> > mget("213801_x_at", map)
> $`213801_x_at`
> [1] "3921" "388524" "574040" "6044" "653162" "730029"
>
> Personally I think that making multiple probes appear that they're
> not mapped to any gene is not doing any good. Hopefully at some point
> this can be reconsidered.
>
> Cheers,
> H.
>
>
> On 07/02/2013 02:53 PM, Christina Chaivorapol wrote:
>> Hi,
>>
>> I seem to be getting different results depending on if I use select() or
>> mget() with the hgu133plus2.db package for a probe with a 1 probe to
>> many
>> gene mapping. Does anyone know why there is a discrepancy?
>>
>>> select(hgu133plus2.db, keys="213801_x_at", cols=c("ENTREZID",
>>> "SYMBOL"),
>> keytype="PROBEID")
>> PROBEID ENTREZID SYMBOL
>> 1 213801_x_at 3921 RPSA
>> 2 213801_x_at 388524 RPSAP58
>> 3 213801_x_at 574040 SNORA6
>> 4 213801_x_at 6044 SNORA62
>> 5 213801_x_at 653162 RPSAP9
>> 6 213801_x_at 730029 RPSAP19
>> Warning message:
>> In .generateExtraRows(tab, keys, jointype) :
>> 'select' resulted in 1:many mapping between keys and return rows
>>
>>> mget("213801_x_at", hgu133plus2ENTREZID)
>> $`213801_x_at`
>> [1] NA
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.3
>> [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0
>> [7] BiocGenerics_0.6.0 limma_3.16.2
>>
>> loaded via a namespace (and not attached):
>> [1] IRanges_1.18.0 stats4_3.0.0 tools_3.0.0
>>
>> Thanks,
>> Christina
>>
>
More information about the Bioconductor
mailing list