[Bioc-devel] idempotent identifier mapping with GSEABase::mapIdentifiers()

Mon Feb 27 18:08:05 CET 2012

thanks Vincent,

what you suggest fixes the situation temporarily and hopefully, as
Martin said in his message before, this can have a more generic
solution.

you suggestion makes me think that, in fact, it could be of general
interest to add a *ENTREZID (identity) map for every entrez-based
organism-level annotation package. i think this could be useful in every
situation in which one would like to programmatically retrieve the
entrez id of a feature using any annotation package without knowing
whether the feature is already an entrez id.

robert.

On Mon, 2012-02-27 at 07:45 -0500, Vincent Carey wrote:
> I have run into a very similar situation.  Ultimately a uniformization
> of the annotation API will be called for.
> I wonder if a global short-term fixup would get you through this
> situation?
> 
> > org.Hs.egENTREZID = new.env(hash=TRUE)
> > k = mappedkeys(org.Hs.egENSEMBL)  # or any other good source of all
> keys
> > for (i in 1:length(k)) assign(k[i], k[i], org.Hs.egENTREZID)
> > get("1000", org.Hs.egENTREZID)
> [1] "1000"
> 
> 
> On Mon, Feb 27, 2012 at 6:25 AM, Robert Castelo
> <robert.castelo at upf.edu> wrote:
>         hi,
>         
>         i collaborate mantaining the packages GSVA and GSVAdata and i
>         have a
>         question about the function mapIdentifiers() from the GSEABase
>         package
>         which i'm going to illustrate through an example.
>         
>         
>         1. let's build first an ExpressionSet object whose annotation
>         slot is
>         going to point to the human organism-level annotation pacakge
>         org.Hs.eg.db:
>         
>         library(Biobase)
>         library(org.Hs.eg.db)
>         
>         mapped_genes <- mappedkeys(org.Hs.egSYMBOL)
>         
>         exp <- matrix(rnorm(1000), nrow=100,
>                      dimnames=list(mapped_genes[1:100],
>                                    paste("sample", 1:10, sep="")))
>         eset <- new("ExpressionSet", exprs=exp,
>         annotation="org.Hs.eg.db")
>         ExpressionSet (storageMode: lockedEnvironment)
>         assayData: 100 features, 10 samples
>          element names: exprs
>         protocolData: none
>         phenoData: none
>         featureData: none
>         experimentData: use 'experimentData(object)'
>         Annotation: org.Hs.eg.db
>         
>         2. now i'm going to load the Broad gene sets stored as a
>         GeneSetCollection object in the experimental data package
>         GSVAdata:
>         
>         library(GSVAdata)
>         data(c2BroadSets)
>         c2BroadSets
>         GeneSetCollection
>          names: NAKAMURA_CANCER_MICROENVIRONMENT_UP,
>         NAKAMURA_CANCER_MICROENVIRONMENT_DN, ...,
>         ST_PHOSPHOINOSITIDE_3_KINASE_PATHWAY (3272 total)
>          unique identifiers: 5167, 100288400, ..., 57191 (29340 total)
>          types in collection:
>            geneIdType: EntrezIdentifier (1 total)
>            collectionType: BroadCollection (1 total)
>         
>         
>         3. finally, i'd like to obtain a new GeneSetCollection object
>         whose
>         identifiers have been mapped between the two classes of
>         identifiers in
>         the GeneSetCollection and the ExpressionSet objects.
>         
>         in this case both objects actually work with the same class of
>         identifiers (Entrez), so in fact i don't need to do that but
>         this
>         operation forms part of a piece of code in the package GSVA
>         which i'd
>         like it to work regardless of the kind of annotation package
>         referred to
>         in the ExpressionSet object. i had expected that the function
>         mapIdentifiers() would have some kind of idempotent behavior,
>         but i get
>         the following error:
>         
>         gsc <- mapIdentifiers(c2BroadSets,
>                              AnnotationIdentifier(annotation(eset)))
>         Error in GeneSetCollection(lapply(what, mapIdentifiers,
>         to, ..., verbose
>         = verbose)) :
>          error in evaluating the argument 'object' in selecting a
>         method for
>         function 'GeneSetCollection': Error in get(mapName, envir =
>         pkgEnv,
>         inherits = FALSE) :
>          object 'org.Hs.egENTREZID' not found
>         
>         
>         which does not occur if the feature names and annotation of
>         the
>         ExpressionSet corresponds to a classical affy chip (e.g.
>         "hgu95av2").
>         
>         i built the object c2BroadSets in the experiment data package
>         GSVAdata
>         by importing the entire xml file from the Broad sets so, i
>         guess it
>         could be also possible that i did something wrong when i built
>         this
>         'c2BroadSets' object and there's no problem, bug or lacking
>         feature in
>         mapIdentifiers().
>         
>         i look forward to your diagnostic and suggestions in any of
>         these
>         possible directions.
>         
>         
>         thanks,
>         robert.
>         
>         _______________________________________________
>         Bioc-devel at r-project.org mailing list
>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
>