[Bioc-devel] mapIds methods for ExpressionSet and SummarizedExperiment

Morgan, Martin Martin.Morgan at roswellpark.org
Fri Dec 18 19:15:39 CET 2015

Hi Ludwig --

It would be really great to see what you've put together; can you make your code available somewhere, maybe via github?

I think the facilities already in Bioconductor include:

- select() and the OrganismDb (e.g., Homo.sapiens) packages

- (Recently introduced, in bioc-devel) GenomicFeatures::mapIds()

- GSEABase mapIdentifiers()

- The AnnotationFuncs package (some of this functionality might be redundant with select() / mapIds(); maybe your idea is a more refined version of this?

- biomaRt, including the relatively under-known use of select() with mart objects.

I think a particularly valuable development (initial implementation in GenomicFeatures::mapIds()) is transparent mapping to / from genomic ranges.

The original intention of the annotation() slot in ExpressionSet was to include the microarray chip identifier, so that one references this when translating from probeset to gene identifiers.

From: Bioc-devel [bioc-devel-bounces at r-project.org] on behalf of Ludwig Geistlinger [Ludwig.Geistlinger at bio.ifi.lmu.de]
Sent: Thursday, December 17, 2015 5:05 AM
To: bioc-devel at r-project.org
Subject: [Bioc-devel] mapIds methods for ExpressionSet and      SummarizedExperiment

Dear Bioc Team,

I have implemented mapIds methods mapping featureNames (ExpressionSet) and
rownames (SummarizedExperiment) between major gene ID types such as
ENSEMBL and ENTREZ by passing that on AnnotationDbi::mapIds.

Given an ExpressionSet/SummarizedExperiment and an organism under
investigation such as 'Homo sapiens', the methods are checking whether the
corresponding org.db package is available, otherwise the package is
automatically installed and loaded.
Subsequently, the featureNames/rownames are mapped from the specified
from.id.type to the desired to.id.type, corresponding to keytypes of the
org.db package.
Options to deal with NA and duplicate mappings are also provided in order
to ensure that featureNames/rownames are unique after the mapping.

Advantage is that end users do not require knowledge of the Bioc
annotation infrastructure, but rather just need to provide the organism
under investigation in a convenient format also for non-Biocs.

I have not found something similar in existing packages and I am wondering
whether this could be something of general interest.


Dipl.-Bioinf. Ludwig Geistlinger

Lehr- und Forschungseinheit für Bioinformatik
Institut für Informatik
Ludwig-Maximilians-Universität München
Amalienstrasse 17, 2. Stock, Büro A201
80333 München

Tel.: 089-2180-4067
eMail: Ludwig.Geistlinger at bio.ifi.lmu.de

Bioc-devel at r-project.org mailing list

This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.

More information about the Bioc-devel mailing list