[Bioc-devel] GSEABase::mapIdentifiers() stopped mapping identifiers
Martin Morgan
mtmorgan at fhcrc.org
Tue Mar 20 18:49:29 CET 2012
On 3/20/2012 8:54 AM, Robert Castelo wrote:
> hi, since we're approaching release and a few days happened since my last
> email exchange about this, let me bring again what the problem is that
> is breaking the GSVA vignette and thus, its build on R-devel with a
> specific example reproducing the problem:
>
> library(Biobase)
> library(GSEABase)
> library(GSVAdata)
>
> data(c2BroadSets)
> data(leukemia)
> class(leukemia_eset)
> [1] "ExpressionSet"
> attr(,"package")
> [1] "Biobase"
> annotation(leukemia_eset)
> [1] "hgu95a"
>
>
> gsc<- mapIdentifiers(c2BroadSets,
> AnnotationIdentifier(annotation(leukemia_eset)))
>
> head(lapply(geneIds(gsc), head))
> $NAKAMURA_CANCER_MICROENVIRONMENT_UP
> [1] "5167" "100288400" "338328" "388" "10631" "440387"
>
> $NAKAMURA_CANCER_MICROENVIRONMENT_DN
> [1] "55215" "9319" "81610" "9455" "64759" "8767"
>
> $WEST_ADRENOCORTICAL_TUMOR_MARKERS_UP
> [1] "5142" "6781" "580" "6713" "112950" "11182"
>
> $WEST_ADRENOCORTICAL_TUMOR_MARKERS_DN
> [1] "125" "2619" "5919" "4856" "5156" "4046"
>
> $WINTER_HYPOXIA_UP
> [1] "7022" "404550" "5738" "9456" "5230" "10856"
>
> $WINTER_HYPOXIA_DN
> [1] "5168" "9452" "3112" "91526" "55843" "9459"
>
> note that these identifiers are still the Entrez Gene Identifiers from
> the GeneSetCollection object 'c2BroadSets' while they should have been
> translated into hgu95a probeset IDs since these are the kind of identifiers
> in the ExpressionSet object 'leukemia_eset'.
>
> the vignette in GSVA breaks because no gene in these "mapped" gene sets
> matches any probeset ID in the ExpressionSet. I think this could potentially
> break the code in the vignette of other package using the function mapIdentifiers().
>
> the most straightforward solution is to revert back to GSEABase 1.17.2 and then
> find later a workaround for the original issue of idempotent maps between
> GeneSetCollection and ExpressionSet objects
> (see https://mailman.stat.ethz.ch/pipermail/bioc-devel/2012-February/003151.html)
>
> sessionInfo()
> R Under development (unstable) (2012-01-31 r58242)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GSVAdata_0.99.3 hgu95a.db_2.6.3 org.Hs.eg.db_2.6.4
> [4] RSQLite_0.11.1 DBI_0.2-5 GSEABase_1.17.3
> [7] graph_1.33.1 annotate_1.33.2 AnnotationDbi_1.17.27
> [10] Biobase_2.15.4 BiocGenerics_0.1.14
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.13.27 stats4_2.15.0 tools_2.15.0 XML_3.9-4
> [5] xtable_1.7-0
>
> i'm pasting below our last communication about this issue.
>
> thanks!
> robert.
>
> ============================================================
> hi Martin,
>
> On Wed, 2012-03-14 at 15:58 -0700, Martin Morgan wrote:
> [...]
>
>> So the answer to the original post should have been closer to 'use
>> EntrezIdentifier for org.Hs.eg.db annotations'? As in the final line of
>>
>> library(Biobase)
>> library(org.Hs.eg.db)
>> library(GSVAdata)
>> data(c2BroadSets)
>> mapped_genes<- mappedkeys(org.Hs.egSYMBOL)
>> exp<- matrix(rnorm(1000), nrow=100,
>> dimnames=list(mapped_genes[1:100], paste("sample", 1:10, sep="")))
>> eset<- new("ExpressionSet", exprs=exp, annotation="org.Hs.eg.db")
>> gsc<- mapIdentifiers(c2BroadSets, EntrezIdentifier(annotation(eset)))
>>
>> class?GeneIdentifierType says AnnotationIdentifier is meant for
>> Affymetrix chip packages (there seem to be some underlying problems
>> anyway, GeneSet(eset) creates AnnotationIdentifier).
> so if you revert now to the former behavior of GSEABase, how can i
> programmatically derive whether the outcome of
>
> annotation(eset)
>
> should be the input to
>
> AnnotationIdentifer()
>
> or to
>
> EntrezIdentifier()
>
> or to any other among the *Identifier() functions described in
> class?GeneIdentifierType
My plan is to revert back and provide a helper function to facilitate
the mapping (in devel, the *db packages have objects named after them
that contain information about their class
> library(org.Hs.eg.db)
> class(org.Hs.eg.db)
[1] "OrgDb"
attr(,"package")
[1] "AnnotationDbi"
> library(hgu95av2.db)
> class(hgu95av2.db)
[1] "ChipDb"
attr(,"package")
[1] "AnnotationDbi"
This wasn't available before, where AnnotationIdentifier /
EntrezIdentifier were playing the role of identifying the class).
Probably GSEABase will not be fixed until tomorrow at the earliest.
Martin
>
>
> ??
>
>
> thanks,
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
More information about the Bioc-devel
mailing list