[Bioc-devel] GSEABase::mapIdentifiers() stopped mapping identifiers

Tue Mar 20 18:49:29 CET 2012

On 3/20/2012 8:54 AM, Robert Castelo wrote:
> hi, since we're approaching release and a few days happened since my last
> email exchange about this, let me bring again what the problem is that
> is breaking the GSVA vignette and thus, its build on R-devel with a
> specific example reproducing the problem:
>
> library(Biobase)
> library(GSEABase)
> library(GSVAdata)
>
> data(c2BroadSets)
> data(leukemia)
> class(leukemia_eset)
> [1] "ExpressionSet"
> attr(,"package")
> [1] "Biobase"
> annotation(leukemia_eset)
> [1] "hgu95a"
>
>
> gsc<- mapIdentifiers(c2BroadSets,
>                        AnnotationIdentifier(annotation(leukemia_eset)))
>
> head(lapply(geneIds(gsc), head))
> $NAKAMURA_CANCER_MICROENVIRONMENT_UP
> [1] "5167"      "100288400" "338328"    "388"       "10631"     "440387"
>
> $NAKAMURA_CANCER_MICROENVIRONMENT_DN
> [1] "55215" "9319"  "81610" "9455"  "64759" "8767"
>
> $WEST_ADRENOCORTICAL_TUMOR_MARKERS_UP
> [1] "5142"   "6781"   "580"    "6713"   "112950" "11182"
>
> $WEST_ADRENOCORTICAL_TUMOR_MARKERS_DN
> [1] "125"  "2619" "5919" "4856" "5156" "4046"
>
> $WINTER_HYPOXIA_UP
> [1] "7022"   "404550" "5738"   "9456"   "5230"   "10856"
>
> $WINTER_HYPOXIA_DN
> [1] "5168"  "9452"  "3112"  "91526" "55843" "9459"
>
> note that these identifiers are still the Entrez Gene Identifiers from
> the GeneSetCollection object 'c2BroadSets' while they should have been
> translated into hgu95a probeset IDs since these are the kind of identifiers
> in the ExpressionSet object 'leukemia_eset'.
>
> the vignette in GSVA breaks because no gene in these "mapped" gene sets
> matches any probeset ID in the ExpressionSet. I think this could potentially
> break the code in the vignette of other package using the function mapIdentifiers().
>
> the most straightforward solution is to revert back to GSEABase 1.17.2 and then
> find later a workaround for the original issue of idempotent maps between
> GeneSetCollection and ExpressionSet objects
> (see https://mailman.stat.ethz.ch/pipermail/bioc-devel/2012-February/003151.html)
>
> sessionInfo()
> R Under development (unstable) (2012-01-31 r58242)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] GSVAdata_0.99.3       hgu95a.db_2.6.3       org.Hs.eg.db_2.6.4
>   [4] RSQLite_0.11.1        DBI_0.2-5             GSEABase_1.17.3
>   [7] graph_1.33.1          annotate_1.33.2       AnnotationDbi_1.17.27
> [10] Biobase_2.15.4        BiocGenerics_0.1.14
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.13.27 stats4_2.15.0   tools_2.15.0    XML_3.9-4
> [5] xtable_1.7-0
>
> i'm pasting below our last communication about this issue.
>
> thanks!
> robert.
>
> ============================================================
> hi Martin,
>
> On Wed, 2012-03-14 at 15:58 -0700, Martin Morgan wrote:
> [...]
>
>> So the answer to the original post should have been closer to 'use
>> EntrezIdentifier for org.Hs.eg.db annotations'? As in the final line of
>>
>> library(Biobase)
>> library(org.Hs.eg.db)
>> library(GSVAdata)
>> data(c2BroadSets)
>> mapped_genes<- mappedkeys(org.Hs.egSYMBOL)
>> exp<- matrix(rnorm(1000), nrow=100,
>>       dimnames=list(mapped_genes[1:100], paste("sample", 1:10, sep="")))
>> eset<- new("ExpressionSet", exprs=exp, annotation="org.Hs.eg.db")
>> gsc<- mapIdentifiers(c2BroadSets, EntrezIdentifier(annotation(eset)))
>>
>> class?GeneIdentifierType says AnnotationIdentifier is meant for
>> Affymetrix chip packages (there seem to be some underlying problems
>> anyway, GeneSet(eset) creates AnnotationIdentifier).
> so if you revert now to the former behavior of GSEABase, how can i
> programmatically derive whether the outcome of
>
> annotation(eset)
>
> should be the input to
>
> AnnotationIdentifer()
>
> or to
>
> EntrezIdentifier()
>
> or to any other among the *Identifier() functions described in
> class?GeneIdentifierType

My plan is to revert back and provide a helper function to facilitate 
the mapping (in devel, the *db packages have objects named after them 
that contain information about their class

 > library(org.Hs.eg.db)
 > class(org.Hs.eg.db)
[1] "OrgDb"
attr(,"package")
[1] "AnnotationDbi"
 > library(hgu95av2.db)
 > class(hgu95av2.db)
[1] "ChipDb"
attr(,"package")
[1] "AnnotationDbi"

This wasn't available before, where AnnotationIdentifier / 
EntrezIdentifier were playing the role of identifying the class). 
Probably GSEABase will not be fixed until tomorrow at the earliest.

Martin

>
>
> ??
>
>
> thanks,
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>

-- 
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109