[Bioc-devel] idempotent identifier mapping with GSEABase::mapIdentifiers()
Robert Castelo
robert.castelo at upf.edu
Mon Mar 5 18:13:11 CET 2012
Martin, thank you very much, it works smoothly in devel, and thanks for
helping a GSVA user on exactly this issue at the BioC list yesterday,
your solution came just right on time :)
robert.
On Fri, 2012-03-02 at 10:17 -0800, Martin Morgan wrote:
> On 02/27/2012 09:08 AM, Robert Castelo wrote:
> > thanks Vincent,
> >
> > what you suggest fixes the situation temporarily and hopefully, as
> > Martin said in his message before, this can have a more generic
> > solution.
>
> This is fixed for Entrez -> Annotation identifier maps in 1.17.3 (in the
> development branch). There may be other idempotent maps that are not
> supported.
>
> > you suggestion makes me think that, in fact, it could be of general
> > interest to add a *ENTREZID (identity) map for every entrez-based
> > organism-level annotation package. i think this could be useful in every
> > situation in which one would like to programmatically retrieve the
> > entrez id of a feature using any annotation package without knowing
> > whether the feature is already an entrez id.
>
> Some of this is addressed using the 'select' interface in the
> forthcoming version of AnnotationDbi (thanks, Marc!) where one can
> specify 'key' and 'cols' that typically allow for idempotent maps.
>
> Martin
>
> >
> > robert.
> >
> >
> > On Mon, 2012-02-27 at 07:45 -0500, Vincent Carey wrote:
> >> I have run into a very similar situation. Ultimately a uniformization
> >> of the annotation API will be called for.
> >> I wonder if a global short-term fixup would get you through this
> >> situation?
> >>
> >>> org.Hs.egENTREZID = new.env(hash=TRUE)
> >>> k = mappedkeys(org.Hs.egENSEMBL) # or any other good source of all
> >> keys
> >>> for (i in 1:length(k)) assign(k[i], k[i], org.Hs.egENTREZID)
> >>> get("1000", org.Hs.egENTREZID)
> >> [1] "1000"
> >>
> >>
> >> On Mon, Feb 27, 2012 at 6:25 AM, Robert Castelo
> >> <robert.castelo at upf.edu> wrote:
> >> hi,
> >>
> >> i collaborate mantaining the packages GSVA and GSVAdata and i
> >> have a
> >> question about the function mapIdentifiers() from the GSEABase
> >> package
> >> which i'm going to illustrate through an example.
> >>
> >>
> >> 1. let's build first an ExpressionSet object whose annotation
> >> slot is
> >> going to point to the human organism-level annotation pacakge
> >> org.Hs.eg.db:
> >>
> >> library(Biobase)
> >> library(org.Hs.eg.db)
> >>
> >> mapped_genes<- mappedkeys(org.Hs.egSYMBOL)
> >>
> >> exp<- matrix(rnorm(1000), nrow=100,
> >> dimnames=list(mapped_genes[1:100],
> >> paste("sample", 1:10, sep="")))
> >> eset<- new("ExpressionSet", exprs=exp,
> >> annotation="org.Hs.eg.db")
> >> ExpressionSet (storageMode: lockedEnvironment)
> >> assayData: 100 features, 10 samples
> >> element names: exprs
> >> protocolData: none
> >> phenoData: none
> >> featureData: none
> >> experimentData: use 'experimentData(object)'
> >> Annotation: org.Hs.eg.db
> >>
> >> 2. now i'm going to load the Broad gene sets stored as a
> >> GeneSetCollection object in the experimental data package
> >> GSVAdata:
> >>
> >> library(GSVAdata)
> >> data(c2BroadSets)
> >> c2BroadSets
> >> GeneSetCollection
> >> names: NAKAMURA_CANCER_MICROENVIRONMENT_UP,
> >> NAKAMURA_CANCER_MICROENVIRONMENT_DN, ...,
> >> ST_PHOSPHOINOSITIDE_3_KINASE_PATHWAY (3272 total)
> >> unique identifiers: 5167, 100288400, ..., 57191 (29340 total)
> >> types in collection:
> >> geneIdType: EntrezIdentifier (1 total)
> >> collectionType: BroadCollection (1 total)
> >>
> >>
> >> 3. finally, i'd like to obtain a new GeneSetCollection object
> >> whose
> >> identifiers have been mapped between the two classes of
> >> identifiers in
> >> the GeneSetCollection and the ExpressionSet objects.
> >>
> >> in this case both objects actually work with the same class of
> >> identifiers (Entrez), so in fact i don't need to do that but
> >> this
> >> operation forms part of a piece of code in the package GSVA
> >> which i'd
> >> like it to work regardless of the kind of annotation package
> >> referred to
> >> in the ExpressionSet object. i had expected that the function
> >> mapIdentifiers() would have some kind of idempotent behavior,
> >> but i get
> >> the following error:
> >>
> >> gsc<- mapIdentifiers(c2BroadSets,
> >> AnnotationIdentifier(annotation(eset)))
> >> Error in GeneSetCollection(lapply(what, mapIdentifiers,
> >> to, ..., verbose
> >> = verbose)) :
> >> error in evaluating the argument 'object' in selecting a
> >> method for
> >> function 'GeneSetCollection': Error in get(mapName, envir =
> >> pkgEnv,
> >> inherits = FALSE) :
> >> object 'org.Hs.egENTREZID' not found
> >>
> >>
> >> which does not occur if the feature names and annotation of
> >> the
> >> ExpressionSet corresponds to a classical affy chip (e.g.
> >> "hgu95av2").
> >>
> >> i built the object c2BroadSets in the experiment data package
> >> GSVAdata
> >> by importing the entire xml file from the Broad sets so, i
> >> guess it
> >> could be also possible that i did something wrong when i built
> >> this
> >> 'c2BroadSets' object and there's no problem, bug or lacking
> >> feature in
> >> mapIdentifiers().
> >>
> >> i look forward to your diagnostic and suggestions in any of
> >> these
> >> possible directions.
> >>
> >>
> >> thanks,
> >> robert.
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
More information about the Bioc-devel
mailing list