[Bioc-devel] GSEABase::mapIdentifiers() stopped mapping identifiers

Robert Castelo robert.castelo at upf.edu
Wed Mar 14 15:08:23 CET 2012


dear list, dear Martin,

recently

https://stat.ethz.ch/pipermail/bioc-devel/2012-March/003173.html

the mapping of identifiers between ExpressionSet objects and
GeneSetCollection objects was modified to avoid an unsuccessful mapping
operation when *both* objects had features based on Entrez Gene
identifiers since the function mapIdentifiers() would not find a
corresponding org.Hs.egENTREZID bimap.

however, it seems that this last modification has broken the regular
mapping between two different kind of identifiers. this is not
manifested as an error during the mapping but it may unpredictable
consequences downstream as, for instance, currently breaking the GSVA
vignette because the feature ids in the GeneSetCollection object do not
map to the feature ids in the ExpressionSet object. here is the code
reproducing the problem:

library(GSEABase)
library(GSVAdata)

data(leukemia)
annotation(leukemia_eset) ## hgu95a chip!
[1] "hgu95a"

data(c2BroadSets)

gsc_hgu95a <- mapIdentifiers(c2BroadSets,
AnnotationIdentifier(annotation(leukemia_eset)))

head(lapply(geneIds(gsc_hgu95a), head)) ## these are not hgu95a IDs!
$NAKAMURA_CANCER_MICROENVIRONMENT_UP
[1] "5167"      "100288400" "338328"    "388"       "10631"
"440387"   

$NAKAMURA_CANCER_MICROENVIRONMENT_DN
[1] "55215" "9319"  "81610" "9455"  "64759" "8767" 

$WEST_ADRENOCORTICAL_TUMOR_MARKERS_UP
[1] "5142"   "6781"   "580"    "6713"   "112950" "11182" 

$WEST_ADRENOCORTICAL_TUMOR_MARKERS_DN
[1] "125"  "2619" "5919" "4856" "5156" "4046"

$WINTER_HYPOXIA_UP
[1] "7022"   "404550" "5738"   "9456"   "5230"   "10856" 

$WINTER_HYPOXIA_DN
[1] "5168"  "9452"  "3112"  "91526" "55843" "9459" 
R Under development (unstable) (2012-01-31 r58242)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
base     

other attached packages:
 [1] GSVAdata_0.99.3       hgu95a.db_2.6.3       org.Hs.eg.db_2.6.4   
 [4] RSQLite_0.11.1        DBI_0.2-5             GSEABase_1.17.3      
 [7] graph_1.33.1          annotate_1.33.2       AnnotationDbi_1.17.23
[10] Biobase_2.15.4        BiocGenerics_0.1.11  

loaded via a namespace (and not attached):
[1] IRanges_1.13.27 stats4_2.15.0   tools_2.15.0    XML_3.9-4      
[5] xtable_1.7-0   

note that the identifiers shown on the previous gene sets should have
been hgu95a probeset identifiers, which is what you get when you do it
using the current release version of GSEABase:

library(GSEABase)
library(GSVAdata)
data(leukemia)
data(c2BroadSets)
gsc_hgu95a <- mapIdentifiers(c2BroadSets,
AnnotationIdentifier(annotation(leukemia_eset)))
head(lapply(geneIds(gsc_hgu95a), head))
$NAKAMURA_CANCER_MICROENVIRONMENT_UP
[1] "342_at"    "343_s_at"  "1826_at"   "1451_s_at" "33436_at"
"32488_at" 

$NAKAMURA_CANCER_MICROENVIRONMENT_DN
[1] "32617_at" "36813_at" "38292_at" "41384_at" "36205_at" "39425_at"

$WEST_ADRENOCORTICAL_TUMOR_MARKERS_UP
[1] "33705_at" "41354_at" "1801_at"  "35839_at" "33432_at" "36907_at"

$WEST_ADRENOCORTICAL_TUMOR_MARKERS_DN
[1] "35730_at" "41839_at" "661_at"   "34407_at" "39250_at" "1731_at" 

$WINTER_HYPOXIA_UP
[1] "40303_at"   "41010_at"   "31488_s_at" "37677_at"   "35758_at"  
[6] "34301_r_at"

$WINTER_HYPOXIA_DN
[1] "41123_s_at" "41124_r_at" "41125_r_at" "40775_at"   "38570_at"  
[6] "37543_at"  

sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
base     

other attached packages:
 [1] GSVAdata_0.99.3       hgu95a.db_2.6.3       org.Hs.eg.db_2.6.4   
 [4] RSQLite_0.11.1        DBI_0.2-5             GSEABase_1.16.0      
 [7] graph_1.32.0          annotate_1.32.1       AnnotationDbi_1.16.16
[10] Biobase_2.14.0       

loaded via a namespace (and not attached):
[1] IRanges_1.12.6 tools_2.14.0   XML_3.9-4      xtable_1.7-0  


i'm sorry that my previous request for idempotent maps broke the more
fundamental mapping functionality but i hope that this has an easy fix.

thanks!!
robert.



More information about the Bioc-devel mailing list