[BioC] GSEABase how to map gene symbols to mouse EntrezId or Affy
Martin Morgan
mtmorgan at fhcrc.org
Thu May 15 20:47:40 CEST 2008
"Vladimir Morozov" <vmorozov at als.net> writes:
> Martin,
>
> You are right that disagreement beween human and mouse symblos is the
> problem. But you still should get some mapping if translate symbols into
> capwords
>> sum(!is.na(mget(gss[[1]]@geneIds,org.Mm.egSYMBOL2EG,ifnotfound=NA)))
> [1] 0
Always use accessors, geneIds(gss[[1]]), ...
> sum(!is.na(mget(capwords(tolower(gss[[1]]@geneIds)),org.Mm.egSYMBOL2EG,i
> fnotfound=NA)))
> [1] 46
be nice to your helpers with complete examples, I guess capwords is
> capwords <- function(x) sub("^([a-z])", "\\U\\1", x, perl=TRUE)
then
> cids <- capwords(tolower(geneIds(gss[[1]])))
> egids <- mget(cids, org.Mm.egSYMBOL2EG, ifnotfound=NA)
> egids <- egids[!is.na(egids)]
> Let's say I will figure out some mapping using ortholog or alias names.
> Will I screw the GeneSet data structure by
> gss2 <- lapply(gss,function(x){x at geneIds <-
> my.mapping(x at geneIds);x at geneIdType@type <- 'EntrezIdentifier'})
More on this below... mapIdentifiers provides a convenient side door
in the form of
> showMethods('mapIdentifiers', class='environment')
Function: mapIdentifiers (package GSEABase)
what="GeneColorSet", to="GeneIdentifierType", from="environment"
what="GeneSet", to="GeneIdentifierType", from="environment"
which is to say that if you have a custom mapping you can represent it
as an environment with keys equal to the identifiers you're mapping
from and values the identifiers you're mapping to, e.g.,
> names(egids) <- toupper(names(egids))
> env <- l2e(egids)
> mapIdentifiers(gss[[1]], EntrezIdentifier(), env)
probably you want to inject information about the identifiers you are
mapping to, e.g., that they are mouse, using as the second argument
EntrezIdentifier('org.Mm.eg.db')
There doesn't seem to be a method defined for gene set collections (an
oversight), but you can
> GeneSetCollection(lapply(gss, mapIdentifiers, EntrezIdentifier(), env))
back to...
> gss2 <- lapply(gss,function(x){x at geneIds <-
> my.mapping(x at geneIds);x at geneIdType@type <- 'EntrezIdentifier'})
There are a bunch of ways through this, but I would avoid using direct
slot access. One possibility would be
> my.mapping <- force
> gss2 <- GeneSetCollection(lapply(gss, function(x) {
> GeneSet(EntrezIdentifier('org.Mm.eg.db'),
> geneIds=my.mapping(geneIds(x)),
> setName=setName(x))
> }))
Martin
> ?
>
>
>
> Vladimir Morozov
>
>
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, May 15, 2008 12:56 PM
> To: Vladimir Morozov
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] GSEABase how to map gene symbols to mouse EntrezId
> or Affy
>
> Hi Vladimir --
>
> "Vladimir Morozov" <vmorozov at als.net> writes:
>
>> Hi
>>
>> Any suggestions how to map gene symbols to mouse EntrezId(preffered)
>> or Affy.
>> mapping to Entez apparently is not supported by GSEABase
>>> mapIdentifiers(gss,EntrezIdentifier())
>> Error in .mapIdentifiers_isMappable(from, to) :
>> unable to map from 'Symbol' to 'EntrezId'
>> neither GeneIdentifierType has annotation
>
> mapIdentifiers needs to know where to look for the map. I guess the way
> you created gss means that it doesn't know about the organism you're
> using, and EntrezIdentifier() also doesn't. What you want is
>
>> mapIdentifiers(gss, EntrezIdentifier("org.Mm.eg.db"))
> GeneSetCollection
> names: chr5q23, chr16q24 (2 total)
> unique identifiers: (0 total)
> types in collection:
> geneIdType: EntrezIdentifier (1 total)
> collectionType: BroadCollection (1 total)
>
> Here I'm using (and I guess you are too) the gss that comes from
> example(getBroadSets). These are human genes, and have no corresponding
> mouse equivalents (see below)...
>
>> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ...,
>> verbose = verbose)) :
>> error in evaluating the argument 'object' in selecting a method for
>> function 'GeneSetCollection'
>>
>>
>> Mapping to Affys works for human, but not for mouse
>>> mapIdentifiers(gss, AnnotationIdentifier("hgu95av2.db"))
>> GeneSetCollection
>> names: chr5q23, chr16q24 (2 total)
>> unique identifiers: 35089_at, 35090_g_at, ..., 35807_at (79 total)
>> types in collection:
>> geneIdType: AnnotationIdentifier (1 total)
>> collectionType: BroadCollection (1 total)
>>> mapIdentifiers(gss, AnnotationIdentifier("mouse4302.db"))
>> GeneSetCollection
>> names: chr5q23, chr16q24 (2 total)
>> unique identifiers: (0 total)
>> types in collection:
>> geneIdType: AnnotationIdentifier (1 total)
>> collectionType: BroadCollection (1 total)
>
> This is becaus the identifiers are not in mouse
>
>> ids <- unique(unlist(geneIds(gss)))
>> egs <- mget(ids, revmap(mouse4302ENTREZID), ifnotfound=NA)
>> sum(!sapply(egs, is.na))
> [1] 0
>
>>>
>>
>>
>> Thanks
>>
>>
>> Vladimir Morozov
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the Bioconductor
mailing list