[Bioc-devel] GSEABase::mapIdentifiers() stopped mapping identifiers

Robert Castelo robert.castelo at upf.edu
Tue Mar 20 16:54:44 CET 2012


hi, since we're approaching release and a few days happened since my last
email exchange about this, let me bring again what the problem is that
is breaking the GSVA vignette and thus, its build on R-devel with a
specific example reproducing the problem:

library(Biobase)
library(GSEABase)
library(GSVAdata)

data(c2BroadSets)
data(leukemia)
class(leukemia_eset)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"
annotation(leukemia_eset)
[1] "hgu95a"


gsc <- mapIdentifiers(c2BroadSets,
                      AnnotationIdentifier(annotation(leukemia_eset)))

head(lapply(geneIds(gsc), head))
$NAKAMURA_CANCER_MICROENVIRONMENT_UP
[1] "5167"      "100288400" "338328"    "388"       "10631"     "440387"   

$NAKAMURA_CANCER_MICROENVIRONMENT_DN
[1] "55215" "9319"  "81610" "9455"  "64759" "8767" 

$WEST_ADRENOCORTICAL_TUMOR_MARKERS_UP
[1] "5142"   "6781"   "580"    "6713"   "112950" "11182" 

$WEST_ADRENOCORTICAL_TUMOR_MARKERS_DN
[1] "125"  "2619" "5919" "4856" "5156" "4046"

$WINTER_HYPOXIA_UP
[1] "7022"   "404550" "5738"   "9456"   "5230"   "10856" 

$WINTER_HYPOXIA_DN
[1] "5168"  "9452"  "3112"  "91526" "55843" "9459" 

note that these identifiers are still the Entrez Gene Identifiers from
the GeneSetCollection object 'c2BroadSets' while they should have been
translated into hgu95a probeset IDs since these are the kind of identifiers
in the ExpressionSet object 'leukemia_eset'.

the vignette in GSVA breaks because no gene in these "mapped" gene sets
matches any probeset ID in the ExpressionSet. I think this could potentially
break the code in the vignette of other package using the function mapIdentifiers().

the most straightforward solution is to revert back to GSEABase 1.17.2 and then
find later a workaround for the original issue of idempotent maps between
GeneSetCollection and ExpressionSet objects
(see https://mailman.stat.ethz.ch/pipermail/bioc-devel/2012-February/003151.html)

sessionInfo()
R Under development (unstable) (2012-01-31 r58242)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GSVAdata_0.99.3       hgu95a.db_2.6.3       org.Hs.eg.db_2.6.4   
 [4] RSQLite_0.11.1        DBI_0.2-5             GSEABase_1.17.3      
 [7] graph_1.33.1          annotate_1.33.2       AnnotationDbi_1.17.27
[10] Biobase_2.15.4        BiocGenerics_0.1.14  

loaded via a namespace (and not attached):
[1] IRanges_1.13.27 stats4_2.15.0   tools_2.15.0    XML_3.9-4      
[5] xtable_1.7-0   

i'm pasting below our last communication about this issue.

thanks!
robert.

============================================================
hi Martin,

On Wed, 2012-03-14 at 15:58 -0700, Martin Morgan wrote:
[...]

> So the answer to the original post should have been closer to 'use 
> EntrezIdentifier for org.Hs.eg.db annotations'? As in the final line of
> 
> library(Biobase)
> library(org.Hs.eg.db)
> library(GSVAdata)
> data(c2BroadSets)
> mapped_genes <- mappedkeys(org.Hs.egSYMBOL)
> exp <- matrix(rnorm(1000), nrow=100,
>      dimnames=list(mapped_genes[1:100], paste("sample", 1:10, sep="")))
> eset <- new("ExpressionSet", exprs=exp, annotation="org.Hs.eg.db")
> gsc <- mapIdentifiers(c2BroadSets, EntrezIdentifier(annotation(eset)))
> 
> class?GeneIdentifierType says AnnotationIdentifier is meant for 
> Affymetrix chip packages (there seem to be some underlying problems 
> anyway, GeneSet(eset) creates AnnotationIdentifier).

so if you revert now to the former behavior of GSEABase, how can i
programmatically derive whether the outcome of

annotation(eset)

should be the input to

AnnotationIdentifer()

or to

EntrezIdentifier()

or to any other among the *Identifier() functions described in
class?GeneIdentifierType


??


thanks,
robert.

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list