[BioC] GSVA: using Entrez ID's as identifiers

Thu Nov 17 18:31:11 CET 2011

hi Som, thanks for the information.

even though you are not running the latest version of R with the latest
version of the package, the problem does not seem to be related to that
fact and i'm able reproduce it with the latest version of both things.
in any case, it is always good to try to work with the last version of
all the software.

the short answer to your question is: try to call the gsva() function
this way (i'm using below the command line you put on your first email):

c3gsc2 <- geneIds(c3gsc2)

new <- gsva(data.m, c3gsc2, abs.ranking=TRUE, min.sz=1, max.sz=Inf,
no.bootstraps=0, bootstrap.percent = .632, parallel.sz=0,
parallel.type="SOCK", verbose=TRUE, mx.diff=TRUE)

the long answer is that, in principle, there is no method on gsva() to
accept a call with the expression data as a matrix and the gene set as a
GeneSetCollection object. When the expression data comes as a matrix the
gene sets should come as a list, which is what we achieve in the line
i've added before the call to gsva() by transforming the
GeneSetCollection object into a list object. Under such a circumstance
you should have got a more informative error saying something like there
is no such gsva() method that accepts "matrix" and "GeneSetCollection",
which is something that we should definitely add at some point to the
package. Instead the method that accepts "matrix" and "list" is
triggered but it breaks at the first line of code that tries to operate
on the list of gene sets because that is not a gene list. This strange
behavior comes from the (strange for me) fact that a GeneSetCollection
object qualifies also as a list, for instance:

library(GSVAdata)
data(c2BroadSets)
class(c2BroadSets)
[1] "GeneSetCollection"
attr(,"package")
[1] "GSEABase"
is.list(c2BroadSets)
[1] TRUE

anyway, let me know if the fix i propose you above does not work.

thanks,
robert.

On Thu, 2011-11-17 at 09:38 -0500, somnath bandyopadhyay wrote:
> Hi Robert,
>  
> Following is the information you asked for: 
>  
> > traceback()
> 7: match(x, y)
> 6: na.omit(match(x, y))
> 5: FUN(X[[1L]], ...)
> 4: lapply(gset.idx.list, function(x, y) na.omit(match(x, y)),
> rownames(expr))
> 3: .local(expr, gset.idx.list, ...)
> 2: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, 
>        no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, 
>        parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE)
> 1: gsva(data.m, c3gsc2, abs.ranking = TRUE, min.sz = 1, max.sz = Inf, 
>        no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, 
>        parallel.type = "SOCK", verbose = TRUE, mx.diff = TRUE)
> > sessionInfo ()
> R version 2.13.1 (2011-07-08)
> Platform: i386-pc-mingw32/i386 (32-bit)
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> LC_NUMERIC=C                          
> [5] LC_TIME=English_United States.1252    
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base     
> other attached packages:
> [1] GSVA_1.0.1           RColorBrewer_1.0-5   limma_3.8.3
> genefilter_1.34.0    GSEABase_1.14.0      graph_1.30.0
> annotate_1.30.1     
> [8] AnnotationDbi_1.14.1 Biobase_2.12.2      
> loaded via a namespace (and not attached):
> [1] DBI_0.2-5        RSQLite_0.10.0   splines_2.13.1
> survival_2.36-10 tools_2.13.1     XML_3.4-2.2      xtable_1.6-0    
> 
>  
> Looking forward to your reply.
>  
> Best Regards,
> Som.
> 
>  
> > Date: Wed, 16 Nov 2011 22:04:06 +0100
> > From: robert.castelo at upf.edu
> > To: genome1976 at hotmail.com
> > CC: bioconductor at r-project.org
> > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers
> > 
> > hi Som,
> > 
> > i'm cc'ing the BioC mailing list, please remember to do it when you 
> > answer since this works as a knowledge base for everyone else.
> > 
> > i'd need two bits of information from you to find out what might be 
> > happening: one, right after the error pops up, please write in the R
> shell:
> > 
> > traceback()
> > 
> > and paste here the output of this function.
> > 
> > two, please paste also here the ouput of
> > 
> > sessionInfo()
> > 
> > 
> > both steps are in fact recommended by the BioC mailinglist posting
> guide:
> > 
> > http://www.bioconductor.org/help/mailing-list/posting-guide
> > 
> > robert.
> > 
> > On 11/16/11 9:04 PM, somnath bandyopadhyay wrote:
> > > Hi Robert,
> > >
> > > I am trying to use GSVA on a microarray dataset and I am trying to
> use
> > > one of the Broad gene set collections for the enrichment purposes.
> > >
> > >
> > > library(GSEABase)
> > > library(Biobase)
> > > library(genefilter)
> > > library(limma)
> > > library(RColorBrewer)
> > > library(graph)
> > > library(GSVA)
> > >
> > > c3gsc2 <-
> > >
> getGmt("c2.cp.kegg.v3.0.entrez.gmt",collectionType=BroadCollection(category="c3"),geneIdType=EntrezIdentifier())
> > > class(c3gsc2)
> > > c3gsc2
> > >
> > > data <- read.table("gsva_infliximab_data.txt", header=T,
> row.names=1,
> > > sep="\t")# the data matrix is filtered for low expressors etc. and
> I am
> > > using Entrez Gene ID as row identifiers.
> > > class(data)
> > > data.m <- as.matrix(data)
> > >
> > > new <- gsva(data.m,
> > >
> c3gsc2,abs.ranking=TRUE,min.sz=1,max.sz=Inf,no.bootstraps=0,bootstrap.percent
> > >
> = .632,parallel.sz=0,parallel.type="SOCK",verbose=TRUE,mx.diff=TRUE)
> > >
> > >
> > > I keep getting the following error at this step
> > > Error in match(x, y) : 'match' requires vector arguments
> > >
> > > Could you pleaase tell me what I am doing wrong?
> > >
> > > Thanks so much,
> > > Som.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > > From: robert.castelo at upf.edu
> > > > To: kellert at ohsu.edu
> > > > Date: Wed, 16 Nov 2011 08:43:48 +0100
> > > > CC: wendy2.qiao at gmail.com; bioconductor at r-project.org
> > > > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers
> > > >
> > > > hi Tom,
> > > >
> > > > i'm a bit unsure what are you asking in relationship with this
> thread,
> > > > but i guess you're interested in creating a custom annotation
> package.
> > > > For that purpose i'd recommend you to read through the vignettes
> of the
> > > > AnnotationDbi package. i'm not an expert in creating custom
> annotation
> > > > packages so if you encounter problems to go ahead i think you
> should
> > > > start a new thread with the specific question or problem you
> want to
> > > > solve.
> > > >
> > > > cheers,
> > > > robert.
> > > >
> > > > On Tue, 2011-11-15 at 14:24 -0800, Tom Keller wrote:
> > > > > Greetings,
> > > > > The annotation for the miRNA chip does not seem to have the
> same
> > > amount of
> > > > > information as the hgu95 db. Is there some help available for
> > > mapping miRNA
> > > > > probes to their target genes?
> > > > >
> > > > > thanks
> > > > > Thomas (Tom) Keller, PhD
> > > > > kellert at ohsu.edu
> > > > > 503.494.2442
> > > > > 6588 R Jones Hall (BSc/CROET)
> > > > > MMI DNA Services
> > > > > Member of OHSU Shared Resources
> > > > >
> > > > > On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote:
> > > > >
> > > > > > hi Wendy,
> > > > > >
> > > > > > i'm afraid you need to get a little bit acquainted with the
> way
> > > in which
> > > > > > annotations are handled in BioC. a good starting point could
> be
> > > looking
> > > > > > a the vignette "AnnotationDbi: How to use the .db annotation
> > > packages"
> > > > > > from the AnnotationDbi package.
> > > > > >
> > > > > > the short answer to your problem is that hgu95a is not the
> only
> > > platform
> > > > > > for which annotations exist in BioC, basically there is an
> annotation
> > > > > > package for each platform supported by BioC (you can look
> all of
> > > them up
> > > > > > by going to
> > > http://www.bioconductor.org/packages/release/BiocViews.html
> > > > > > and clicking on "AnnotationData") but in order to use on
> such
> > > annotation
> > > > > > packages you need
> > > > > >
> > > > > > 1. install it once in your system via source() and
> biocLite() just as
> > > > > > with every software package
> > > > > >
> > > > > > 2. load it via the library() function.
> > > > > >
> > > > > > in order to use the human organism-level package i mentioned
> in my
> > > > > > previous email you need to install it first and then load it
> > > prior to do
> > > > > > anything else with it.
> > > > > >
> > > > > > let me know if this still does not solve your problem.
> > > > > >
> > > > > > cheers,
> > > > > > robert.
> > > > > >
> > > > > > On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote:
> > > > > >> Hi Robert,
> > > > > >>
> > > > > >> Thank you for your reply. I happened to convert all the
> genes to
> > > > > >> hgu95a probe IDs as I found that this is the only platform
> that
> > > works
> > > > > >> with ExpressionSet. It would be great that we could make
> the
> > > entrez ID
> > > > > >> works. Following is my error that I got with your code.
> > > > > >>
> > > > > >>
> > > > > >> Thank you.
> > > > > >> Wendy
> > > > > >>
> > > > > >>
> > > > > >>> BcellSet
> > > > > >> ExpressionSet (storageMode: lockedEnvironment)
> > > > > >> assayData: 12148 features, 7 samples
> > > > > >> element names: exprs
> > > > > >> protocolData: none
> > > > > >> phenoData
> > > > > >> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ...
> Affy_PREBCEL_4 (7
> > > > > >> total)
> > > > > >> varLabels: CellType Platform Replicates
> > > > > >> varMetadata: labelDescription
> > > > > >> featureData: none
> > > > > >> experimentData: use 'experimentData(object)'
> > > > > >> Annotation: org.Hs.eg.db
> > > > > >>>
> > > > > >>
> > >
> preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs
> > > > > >> Mapping identifiers between gene sets and feature names
> > > > > >> Error in GeneSetCollection(lapply(what, mapIdentifiers,
> to, ...,
> > > > > >> verbose = verbose)) :
> > > > > >> error in evaluating the argument 'object' in selecting a
> method for
> > > > > >> function 'GeneSetCollection': Error in get(mapName, envir =
> pkgEnv,
> > > > > >> inherits = FALSE) :
> > > > > >> object 'org.Hs.egENTREZID' not found
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On 14 November 2011 12:27, Robert Castelo
> <robert.castelo at upf.edu>
> > > > > >> wrote:
> > > > > >> hi Wendy,
> > > > > >>
> > > > > >> sorry for my late answer. in principle there is no problem
> for
> > > > > >> the
> > > > > >> gsva() function to take Entrez IDs in your expression data
> > > > > >> matrix.
> > > > > >>
> > > > > >> if the expression data comes as a matrix, and rows are
> > > > > >> annotated with
> > > > > >> Entrez IDs and the gene sets are also annotated with Entrez
> > > > > >> IDs, there
> > > > > >> should be absolutely no problem.
> > > > > >>
> > > > > >> if the expression data comes as an ExpressionSet object
> where
> > > > > >> the
> > > > > >> 'features' are not Affy probe IDs but just EntrezIDs. just
> > > > > >> make sure
> > > > > >> that the annotation slot has the corresponding
> organism-level
> > > > > >> package.
> > > > > >> for instance, in the case of human:
> > > > > >>
> > > > > >> annotation(eset) <- "org.Hs.eg.db"
> > > > > >>
> > > > > >> let me know if you have any problem with this.
> > > > > >>
> > > > > >> cheers,
> > > > > >> robert.
> > > > > >>
> > > > > >> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote:
> > > > > >>> Hi all,
> > > > > >>>
> > > > > >>> I am using the GSVA package for some analysis. I found
> that
> > > > > >> the package
> > > > > >>> only takes the gene expression matrix annotated with
> > > > > >> affymetrix probe IDs,
> > > > > >>> although the gene set collection is made of Entrez IDs. I
> > > > > >> imagine there a
> > > > > >>> step in the package for converting the Affymetrix probe
> IDs
> > > > > >> to Entrez IDs.
> > > > > >>> As my data are from the Illumina platform, I am wondering
> if
> > > > > >> an expression
> > > > > >>> matrix annotated with Entrez IDs can be used directly.
> > > > > >>>
> > > > > >>> Thank you,
> > > > > >>> Wendy
> > > > > >>>
> > > > > >>
> > > > > >>> [[alternative HTML version deleted]]
> > > > > >>>
> > > > > >>> _______________________________________________
> > > > > >>> Bioconductor mailing list
> > > > > >>> Bioconductor at r-project.org
> > > > > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > > >>> Search the archives:
> > > > > >>
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >
> > > > > > _______________________________________________
> > > > > > Bioconductor mailing list
> > > > > > Bioconductor at r-project.org
> > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > > > Search the archives:
> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > > > >
> > > > >
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at r-project.org
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > Search the archives:
> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > 
>