[BioC] GSVA: using Entrez ID's as identifiers

Wed Nov 16 22:04:06 CET 2011

hi Som,

i'm cc'ing the BioC mailing list, please remember to do it when you 
answer since this works as a knowledge base for everyone else.

i'd need two bits of information from you to find out what might be 
happening: one, right after the error pops up, please write in the R shell:

traceback()

and paste here the output of this function.

two, please paste also here the ouput of

sessionInfo()

both steps are in fact recommended by the BioC mailinglist posting guide:

http://www.bioconductor.org/help/mailing-list/posting-guide

robert.

On 11/16/11 9:04 PM, somnath bandyopadhyay wrote:
> Hi Robert,
>
> I am trying to use GSVA on a microarray dataset and I am trying to use
> one of the Broad gene set collections for the enrichment purposes.
>
>
> library(GSEABase)
> library(Biobase)
> library(genefilter)
> library(limma)
> library(RColorBrewer)
> library(graph)
> library(GSVA)
>
> c3gsc2 <-
> getGmt("c2.cp.kegg.v3.0.entrez.gmt",collectionType=BroadCollection(category="c3"),geneIdType=EntrezIdentifier())
> class(c3gsc2)
> c3gsc2
>
> data <- read.table("gsva_infliximab_data.txt", header=T, row.names=1,
> sep="\t")# the data matrix is filtered for low expressors etc. and I am
> using Entrez Gene ID as row identifiers.
> class(data)
> data.m <- as.matrix(data)
>
> new <- gsva(data.m,
> c3gsc2,abs.ranking=TRUE,min.sz=1,max.sz=Inf,no.bootstraps=0,bootstrap.percent
> = .632,parallel.sz=0,parallel.type="SOCK",verbose=TRUE,mx.diff=TRUE)
>
>
> I keep getting the following error at this step
> Error in match(x, y) : 'match' requires vector arguments
>
> Could you pleaase tell me what I am doing wrong?
>
> Thanks so much,
> Som.
>
>
>
>
>
>
>
>
>  > From: robert.castelo at upf.edu
>  > To: kellert at ohsu.edu
>  > Date: Wed, 16 Nov 2011 08:43:48 +0100
>  > CC: wendy2.qiao at gmail.com; bioconductor at r-project.org
>  > Subject: Re: [BioC] GSVA: using Entrez ID's as identifiers
>  >
>  > hi Tom,
>  >
>  > i'm a bit unsure what are you asking in relationship with this thread,
>  > but i guess you're interested in creating a custom annotation package.
>  > For that purpose i'd recommend you to read through the vignettes of the
>  > AnnotationDbi package. i'm not an expert in creating custom annotation
>  > packages so if you encounter problems to go ahead i think you should
>  > start a new thread with the specific question or problem you want to
>  > solve.
>  >
>  > cheers,
>  > robert.
>  >
>  > On Tue, 2011-11-15 at 14:24 -0800, Tom Keller wrote:
>  > > Greetings,
>  > > The annotation for the miRNA chip does not seem to have the same
> amount of
>  > > information as the hgu95 db. Is there some help available for
> mapping miRNA
>  > > probes to their target genes?
>  > >
>  > > thanks
>  > > Thomas (Tom) Keller, PhD
>  > > kellert at ohsu.edu
>  > > 503.494.2442
>  > > 6588 R Jones Hall (BSc/CROET)
>  > > MMI DNA Services
>  > > Member of OHSU Shared Resources
>  > >
>  > > On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote:
>  > >
>  > > > hi Wendy,
>  > > >
>  > > > i'm afraid you need to get a little bit acquainted with the way
> in which
>  > > > annotations are handled in BioC. a good starting point could be
> looking
>  > > > a the vignette "AnnotationDbi: How to use the .db annotation
> packages"
>  > > > from the AnnotationDbi package.
>  > > >
>  > > > the short answer to your problem is that hgu95a is not the only
> platform
>  > > > for which annotations exist in BioC, basically there is an annotation
>  > > > package for each platform supported by BioC (you can look all of
> them up
>  > > > by going to
> http://www.bioconductor.org/packages/release/BiocViews.html
>  > > > and clicking on "AnnotationData") but in order to use on such
> annotation
>  > > > packages you need
>  > > >
>  > > > 1. install it once in your system via source() and biocLite() just as
>  > > > with every software package
>  > > >
>  > > > 2. load it via the library() function.
>  > > >
>  > > > in order to use the human organism-level package i mentioned in my
>  > > > previous email you need to install it first and then load it
> prior to do
>  > > > anything else with it.
>  > > >
>  > > > let me know if this still does not solve your problem.
>  > > >
>  > > > cheers,
>  > > > robert.
>  > > >
>  > > > On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote:
>  > > >> Hi Robert,
>  > > >>
>  > > >> Thank you for your reply. I happened to convert all the genes to
>  > > >> hgu95a probe IDs as I found that this is the only platform that
> works
>  > > >> with ExpressionSet. It would be great that we could make the
> entrez ID
>  > > >> works. Following is my error that I got with your code.
>  > > >>
>  > > >>
>  > > >> Thank you.
>  > > >> Wendy
>  > > >>
>  > > >>
>  > > >>> BcellSet
>  > > >> ExpressionSet (storageMode: lockedEnvironment)
>  > > >> assayData: 12148 features, 7 samples
>  > > >> element names: exprs
>  > > >> protocolData: none
>  > > >> phenoData
>  > > >> sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... Affy_PREBCEL_4 (7
>  > > >> total)
>  > > >> varLabels: CellType Platform Replicates
>  > > >> varMetadata: labelDescription
>  > > >> featureData: none
>  > > >> experimentData: use 'experimentData(object)'
>  > > >> Annotation: org.Hs.eg.db
>  > > >>>
>  > > >>
> preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs
>  > > >> Mapping identifiers between gene sets and feature names
>  > > >> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ...,
>  > > >> verbose = verbose)) :
>  > > >> error in evaluating the argument 'object' in selecting a method for
>  > > >> function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv,
>  > > >> inherits = FALSE) :
>  > > >> object 'org.Hs.egENTREZID' not found
>  > > >>
>  > > >>
>  > > >>
>  > > >>
>  > > >> On 14 November 2011 12:27, Robert Castelo <robert.castelo at upf.edu>
>  > > >> wrote:
>  > > >> hi Wendy,
>  > > >>
>  > > >> sorry for my late answer. in principle there is no problem for
>  > > >> the
>  > > >> gsva() function to take Entrez IDs in your expression data
>  > > >> matrix.
>  > > >>
>  > > >> if the expression data comes as a matrix, and rows are
>  > > >> annotated with
>  > > >> Entrez IDs and the gene sets are also annotated with Entrez
>  > > >> IDs, there
>  > > >> should be absolutely no problem.
>  > > >>
>  > > >> if the expression data comes as an ExpressionSet object where
>  > > >> the
>  > > >> 'features' are not Affy probe IDs but just EntrezIDs. just
>  > > >> make sure
>  > > >> that the annotation slot has the corresponding organism-level
>  > > >> package.
>  > > >> for instance, in the case of human:
>  > > >>
>  > > >> annotation(eset) <- "org.Hs.eg.db"
>  > > >>
>  > > >> let me know if you have any problem with this.
>  > > >>
>  > > >> cheers,
>  > > >> robert.
>  > > >>
>  > > >> On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote:
>  > > >>> Hi all,
>  > > >>>
>  > > >>> I am using the GSVA package for some analysis. I found that
>  > > >> the package
>  > > >>> only takes the gene expression matrix annotated with
>  > > >> affymetrix probe IDs,
>  > > >>> although the gene set collection is made of Entrez IDs. I
>  > > >> imagine there a
>  > > >>> step in the package for converting the Affymetrix probe IDs
>  > > >> to Entrez IDs.
>  > > >>> As my data are from the Illumina platform, I am wondering if
>  > > >> an expression
>  > > >>> matrix annotated with Entrez IDs can be used directly.
>  > > >>>
>  > > >>> Thank you,
>  > > >>> Wendy
>  > > >>>
>  > > >>
>  > > >>> [[alternative HTML version deleted]]
>  > > >>>
>  > > >>> _______________________________________________
>  > > >>> Bioconductor mailing list
>  > > >>> Bioconductor at r-project.org
>  > > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>  > > >>> Search the archives:
>  > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>  > > >>>
>  > > >>
>  > > >>
>  > > >>
>  > > >>
>  > > >
>  > > > _______________________________________________
>  > > > Bioconductor mailing list
>  > > > Bioconductor at r-project.org
>  > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>  > > > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>  > >
>  > >
>  >
>  > _______________________________________________
>  > Bioconductor mailing list
>  > Bioconductor at r-project.org
>  > https://stat.ethz.ch/mailman/listinfo/bioconductor
>  > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor