[Bioc-devel] idempotent identifier mapping with GSEABase::mapIdentifiers()

Fri Mar 2 19:17:04 CET 2012

On 02/27/2012 09:08 AM, Robert Castelo wrote:
> thanks Vincent,
>
> what you suggest fixes the situation temporarily and hopefully, as
> Martin said in his message before, this can have a more generic
> solution.

This is fixed for Entrez -> Annotation identifier maps in 1.17.3 (in the 
development branch). There may be other idempotent maps that are not 
supported.

> you suggestion makes me think that, in fact, it could be of general
> interest to add a *ENTREZID (identity) map for every entrez-based
> organism-level annotation package. i think this could be useful in every
> situation in which one would like to programmatically retrieve the
> entrez id of a feature using any annotation package without knowing
> whether the feature is already an entrez id.

Some of this is addressed using the 'select' interface in the 
forthcoming version of AnnotationDbi (thanks, Marc!) where one can 
specify 'key' and 'cols' that typically allow for idempotent maps.

Martin

>
> robert.
>
>
> On Mon, 2012-02-27 at 07:45 -0500, Vincent Carey wrote:
>> I have run into a very similar situation.  Ultimately a uniformization
>> of the annotation API will be called for.
>> I wonder if a global short-term fixup would get you through this
>> situation?
>>
>>> org.Hs.egENTREZID = new.env(hash=TRUE)
>>> k = mappedkeys(org.Hs.egENSEMBL)  # or any other good source of all
>> keys
>>> for (i in 1:length(k)) assign(k[i], k[i], org.Hs.egENTREZID)
>>> get("1000", org.Hs.egENTREZID)
>> [1] "1000"
>>
>>
>> On Mon, Feb 27, 2012 at 6:25 AM, Robert Castelo
>> <robert.castelo at upf.edu>  wrote:
>>          hi,
>>
>>          i collaborate mantaining the packages GSVA and GSVAdata and i
>>          have a
>>          question about the function mapIdentifiers() from the GSEABase
>>          package
>>          which i'm going to illustrate through an example.
>>
>>
>>          1. let's build first an ExpressionSet object whose annotation
>>          slot is
>>          going to point to the human organism-level annotation pacakge
>>          org.Hs.eg.db:
>>
>>          library(Biobase)
>>          library(org.Hs.eg.db)
>>
>>          mapped_genes<- mappedkeys(org.Hs.egSYMBOL)
>>
>>          exp<- matrix(rnorm(1000), nrow=100,
>>                       dimnames=list(mapped_genes[1:100],
>>                                     paste("sample", 1:10, sep="")))
>>          eset<- new("ExpressionSet", exprs=exp,
>>          annotation="org.Hs.eg.db")
>>          ExpressionSet (storageMode: lockedEnvironment)
>>          assayData: 100 features, 10 samples
>>           element names: exprs
>>          protocolData: none
>>          phenoData: none
>>          featureData: none
>>          experimentData: use 'experimentData(object)'
>>          Annotation: org.Hs.eg.db
>>
>>          2. now i'm going to load the Broad gene sets stored as a
>>          GeneSetCollection object in the experimental data package
>>          GSVAdata:
>>
>>          library(GSVAdata)
>>          data(c2BroadSets)
>>          c2BroadSets
>>          GeneSetCollection
>>           names: NAKAMURA_CANCER_MICROENVIRONMENT_UP,
>>          NAKAMURA_CANCER_MICROENVIRONMENT_DN, ...,
>>          ST_PHOSPHOINOSITIDE_3_KINASE_PATHWAY (3272 total)
>>           unique identifiers: 5167, 100288400, ..., 57191 (29340 total)
>>           types in collection:
>>             geneIdType: EntrezIdentifier (1 total)
>>             collectionType: BroadCollection (1 total)
>>
>>
>>          3. finally, i'd like to obtain a new GeneSetCollection object
>>          whose
>>          identifiers have been mapped between the two classes of
>>          identifiers in
>>          the GeneSetCollection and the ExpressionSet objects.
>>
>>          in this case both objects actually work with the same class of
>>          identifiers (Entrez), so in fact i don't need to do that but
>>          this
>>          operation forms part of a piece of code in the package GSVA
>>          which i'd
>>          like it to work regardless of the kind of annotation package
>>          referred to
>>          in the ExpressionSet object. i had expected that the function
>>          mapIdentifiers() would have some kind of idempotent behavior,
>>          but i get
>>          the following error:
>>
>>          gsc<- mapIdentifiers(c2BroadSets,
>>                               AnnotationIdentifier(annotation(eset)))
>>          Error in GeneSetCollection(lapply(what, mapIdentifiers,
>>          to, ..., verbose
>>          = verbose)) :
>>           error in evaluating the argument 'object' in selecting a
>>          method for
>>          function 'GeneSetCollection': Error in get(mapName, envir =
>>          pkgEnv,
>>          inherits = FALSE) :
>>           object 'org.Hs.egENTREZID' not found
>>
>>
>>          which does not occur if the feature names and annotation of
>>          the
>>          ExpressionSet corresponds to a classical affy chip (e.g.
>>          "hgu95av2").
>>
>>          i built the object c2BroadSets in the experiment data package
>>          GSVAdata
>>          by importing the entire xml file from the Broad sets so, i
>>          guess it
>>          could be also possible that i did something wrong when i built
>>          this
>>          'c2BroadSets' object and there's no problem, bug or lacking
>>          feature in
>>          mapIdentifiers().
>>
>>          i look forward to your diagnostic and suggestions in any of
>>          these
>>          possible directions.
>>
>>
>>          thanks,
>>          robert.
>>
>>          _______________________________________________
>>          Bioc-devel at r-project.org mailing list
>>          https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793