[Bioc-devel] mapping probes/probesets between platforms

Hervé Pagès hpages at fhcrc.org
Thu Jul 22 19:39:12 CEST 2010

Hi Peter,

On 07/21/2010 03:24 PM, Sean Davis wrote:
> There are many alternatives for alignment including blast, blat, gmap,
> ssaha, etc.  This could also be done with biostrings.
> Sean
> On Jul 21, 2010 2:43 PM, "Bazeley, Peter"<Peter.Bazeley at rockets.utoledo.edu>
> wrote:
> Hi Sean,
> For the second option, is there a specific package that you had in mind?
> Perhaps the GenomicFeatures package, which can download UCSC tables.

Depends which "second option" we are talking about.

In Sean's original suggestion:

   the second is to map between common gene ids (unigene, ensg, entrez
   gene id, hgnc gene symbol, etc.)

I think you could use our regular .db annotations which are
gene-centric. They contain all kinds of mappings between probeset ids
and other ids like Entrez Gene ids. If there is no .db package for
your platform, look at the SQLforge vignette in the AnnotationDbi
package for how to make your own.

Here is an example of how you could map hgu95av2 probesets to hgu133b
probesets based on their corresponding Entrez Gene ids:


   ## See the mappings available for hgu95av2:
   ## and for hgu133b:

   ## Make sure the ENTREZID mappings are what you are looking for:

   x <- toTable(hgu95av2ENTREZID)
   x[1:10, ]
   y <- toTable(hgu133bENTREZID)
   y[1:10, ]

   ## Combine 'x' and 'y' data frames i.e. "join" the 2 data frames based
   ## on the values in their gene_id columns:
   common_ids <- intersect(x$gene_id, y$gene_id)
   xy <- x[x$gene_id %in% common_ids, ]
   y_id2probe <- y$probe_id
   names(y_id2probe) <- y$gene_id
   y_id2probe <- y_id2probe[names(y_id2probe) %in% common_ids]
   tmp <- split(y_id2probe, names(y_id2probe))
   tmp <- tmp[xy$gene_id]
   xy <- xy[rep.int(seq_len(nrow(xy)), elementLengths(tmp)), ]
   row.names(xy) <- NULL
   xy$probe_id2 <- unname(unlist(tmp))
   xy[1:10, ]  # the mapping is not one-to-one

But maybe by "second option" you meant Sean's entirely different 
proposed approach to realign the probe sequences to a transcript
library. This is more complicated but still doable with our tools.
The main steps could be:

   - use the GenomicFeatures package together with the BSgenome
     data package that contains the reference genome for your platforms
     to extract the transcriptome sequences;

   - use vwhichPDict() (from the Biostrings package) twice: first to
     map the probe sequences in hgu95av2probe to the transcriptome
     and a second time to map the probe sequences in hgu133bprobe to
     the transcriptome;

   - combine the 2 mappings in a fashion similar to how 'x' and 'y'
     above were combined.

Note that this time the result will be a mapping between the probes
(not the probesets) of the 2 platforms. Some extra work would then
be needed to convert this into a mapping between probeset ids.

Hope that helps.


> Thanks,
> Pete
> ________________________________
> From: seandavi at gmail.com [seandavi at gmail.com] on behalf of Sean Davis [
> sdavis2 at mail.nih.gov]
> Sent: Wednesday, July 21, 2010 7:27 AM
> To: Bazeley, Peter
> Cc: bioc-devel at stat.math.ethz.ch
> Subject: Re: [Bioc-devel] mapping probes/probesets between platforms
> On Tue, Jul 20, 2010 at 10:50 PM, Bazeley, Peter<
> Peter.Bazeley at rockets.utoledo.edu<mailto:Peter....
>         [[alternative HTML version deleted]]
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://s...
> 	[[alternative HTML version deleted]]
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioc-devel mailing list