[Bioc-devel] idempotent identifier mapping with GSEABase::mapIdentifiers()

Mon Mar 5 18:13:11 CET 2012

Martin, thank you very much, it works smoothly in devel, and thanks for
helping a GSVA user on exactly this issue at the BioC list yesterday,
your solution came just right on time :)

robert.

On Fri, 2012-03-02 at 10:17 -0800, Martin Morgan wrote:
> On 02/27/2012 09:08 AM, Robert Castelo wrote:
> > thanks Vincent,
> >
> > what you suggest fixes the situation temporarily and hopefully, as
> > Martin said in his message before, this can have a more generic
> > solution.
> 
> This is fixed for Entrez -> Annotation identifier maps in 1.17.3 (in the 
> development branch). There may be other idempotent maps that are not 
> supported.
> 
> > you suggestion makes me think that, in fact, it could be of general
> > interest to add a *ENTREZID (identity) map for every entrez-based
> > organism-level annotation package. i think this could be useful in every
> > situation in which one would like to programmatically retrieve the
> > entrez id of a feature using any annotation package without knowing
> > whether the feature is already an entrez id.
> 
> Some of this is addressed using the 'select' interface in the 
> forthcoming version of AnnotationDbi (thanks, Marc!) where one can 
> specify 'key' and 'cols' that typically allow for idempotent maps.
> 
> Martin
> 
> >
> > robert.
> >
> >
> > On Mon, 2012-02-27 at 07:45 -0500, Vincent Carey wrote:
> >> I have run into a very similar situation.  Ultimately a uniformization
> >> of the annotation API will be called for.
> >> I wonder if a global short-term fixup would get you through this
> >> situation?
> >>
> >>> org.Hs.egENTREZID = new.env(hash=TRUE)
> >>> k = mappedkeys(org.Hs.egENSEMBL)  # or any other good source of all
> >> keys
> >>> for (i in 1:length(k)) assign(k[i], k[i], org.Hs.egENTREZID)
> >>> get("1000", org.Hs.egENTREZID)
> >> [1] "1000"
> >>
> >>
> >> On Mon, Feb 27, 2012 at 6:25 AM, Robert Castelo
> >> <robert.castelo at upf.edu>  wrote:
> >>          hi,
> >>
> >>          i collaborate mantaining the packages GSVA and GSVAdata and i
> >>          have a
> >>          question about the function mapIdentifiers() from the GSEABase
> >>          package
> >>          which i'm going to illustrate through an example.
> >>
> >>
> >>          1. let's build first an ExpressionSet object whose annotation
> >>          slot is
> >>          going to point to the human organism-level annotation pacakge
> >>          org.Hs.eg.db:
> >>
> >>          library(Biobase)
> >>          library(org.Hs.eg.db)
> >>
> >>          mapped_genes<- mappedkeys(org.Hs.egSYMBOL)
> >>
> >>          exp<- matrix(rnorm(1000), nrow=100,
> >>                       dimnames=list(mapped_genes[1:100],
> >>                                     paste("sample", 1:10, sep="")))
> >>          eset<- new("ExpressionSet", exprs=exp,
> >>          annotation="org.Hs.eg.db")
> >>          ExpressionSet (storageMode: lockedEnvironment)
> >>          assayData: 100 features, 10 samples
> >>           element names: exprs
> >>          protocolData: none
> >>          phenoData: none
> >>          featureData: none
> >>          experimentData: use 'experimentData(object)'
> >>          Annotation: org.Hs.eg.db
> >>
> >>          2. now i'm going to load the Broad gene sets stored as a
> >>          GeneSetCollection object in the experimental data package
> >>          GSVAdata:
> >>
> >>          library(GSVAdata)
> >>          data(c2BroadSets)
> >>          c2BroadSets
> >>          GeneSetCollection
> >>           names: NAKAMURA_CANCER_MICROENVIRONMENT_UP,
> >>          NAKAMURA_CANCER_MICROENVIRONMENT_DN, ...,
> >>          ST_PHOSPHOINOSITIDE_3_KINASE_PATHWAY (3272 total)
> >>           unique identifiers: 5167, 100288400, ..., 57191 (29340 total)
> >>           types in collection:
> >>             geneIdType: EntrezIdentifier (1 total)
> >>             collectionType: BroadCollection (1 total)
> >>
> >>
> >>          3. finally, i'd like to obtain a new GeneSetCollection object
> >>          whose
> >>          identifiers have been mapped between the two classes of
> >>          identifiers in
> >>          the GeneSetCollection and the ExpressionSet objects.
> >>
> >>          in this case both objects actually work with the same class of
> >>          identifiers (Entrez), so in fact i don't need to do that but
> >>          this
> >>          operation forms part of a piece of code in the package GSVA
> >>          which i'd
> >>          like it to work regardless of the kind of annotation package
> >>          referred to
> >>          in the ExpressionSet object. i had expected that the function
> >>          mapIdentifiers() would have some kind of idempotent behavior,
> >>          but i get
> >>          the following error:
> >>
> >>          gsc<- mapIdentifiers(c2BroadSets,
> >>                               AnnotationIdentifier(annotation(eset)))
> >>          Error in GeneSetCollection(lapply(what, mapIdentifiers,
> >>          to, ..., verbose
> >>          = verbose)) :
> >>           error in evaluating the argument 'object' in selecting a
> >>          method for
> >>          function 'GeneSetCollection': Error in get(mapName, envir =
> >>          pkgEnv,
> >>          inherits = FALSE) :
> >>           object 'org.Hs.egENTREZID' not found
> >>
> >>
> >>          which does not occur if the feature names and annotation of
> >>          the
> >>          ExpressionSet corresponds to a classical affy chip (e.g.
> >>          "hgu95av2").
> >>
> >>          i built the object c2BroadSets in the experiment data package
> >>          GSVAdata
> >>          by importing the entire xml file from the Broad sets so, i
> >>          guess it
> >>          could be also possible that i did something wrong when i built
> >>          this
> >>          'c2BroadSets' object and there's no problem, bug or lacking
> >>          feature in
> >>          mapIdentifiers().
> >>
> >>          i look forward to your diagnostic and suggestions in any of
> >>          these
> >>          possible directions.
> >>
> >>
> >>          thanks,
> >>          robert.
> >>
> >>          _______________________________________________
> >>          Bioc-devel at r-project.org mailing list
> >>          https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
>