[BioC] GOstat question
Sean Davis
sdavis2 at mail.nih.gov
Fri Mar 21 15:02:46 CET 2008
On Fri, Mar 21, 2008 at 9:51 AM, nicolas servant
<Nicolas.Servant at curie.fr> wrote:
> > sessionInfo()
> R version 2.5.0 (2007-04-23)
> i386-pc-solaris2.10
>
> locale:
> C
>
> attached base packages:
> [1] "splines" "tools" "stats" "graphics" "grDevices" "utils"
> [7] "datasets" "methods" "base"
>
> other attached packages:
> GOstats Category Matrix lattice genefilter survival
> "2.2.6" "2.2.3" "0.99875-2" "0.15-5" "1.14.1" "2.31"
> KEGG RBGL annotate Biobase GO graph
> "1.16.1" "1.12.0" "1.14.1" "1.14.0" "1.16.0" "1.14.2"
> hgu133plus2
> "1.16.0"
>
>
> Here, a *reproductible* example :
>
> require(hgu133plus2)
>
> ##UNIVERSE
> designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
> ##GO
> haveGo <- sapply(mget(designALL, hgu133plus2GO),
> function(x) {
> if (length(x) == 1 && is.na(x)){
> FALSE
> }
> else{
> TRUE
> }
> })
> numNoGO <- sum(!haveGo)
> print(paste(numNoGO, "pbsets in Univers have no GO ids"))
> designALL<-designALL[haveGo]
>
> entrezUniverse <-
> na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
> if (any(duplicated(entrezUniverse))){
> stop("error in gene universe: can't have duplicate Ent")
> }
>
> ##ALIST
> selectedEntrezIds <- entrezUniverse[1:150]
>
> require(GOstats)
>
> params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pvalueCutoff=0.05,conditional=TRUE,testDirection="over")
> hgOver.BP<-hyperGTest(params.BP.over)
>
> length(selectedEntrezIds)
> length(hgOver.BP at geneIds)
>
> As you can see, i performed a non-specific filtering by removing the
> probesets with no GO annotations (as presented in the vignette !).
> I'm not interested here by using an other filtering approach like
> Student test, IQR filtering, etc ...
>
> > length(selectedEntrezIds)
> [1] 150
> > length(hgOver.BP at geneIds)
> [1] 124
>
> To summarize, all the entrezID of my list have a GO annotation, but the
> hyperGTest function removes 26 additional genes.
> I guess there are some explanations ?
All the entrezID of your list have a GO annotation; that does not
imply that they all have a GO BP annotation. So, your difference of
26 genes is probably due to the lack of annotation within the GO BP
ontology. I did not run your example to prove that is the case, but
it makes sense that the lists can, in general be different lengths. I
do not see a problem here.
Sean
> James W. MacDonald a écrit :
> > Hi Nicolas,
> >
> > nicolas servant wrote:
> >> Sorry. I use the BioC GOStats package (2.2.6).
> >
> > Yes, but you *still* refuse to follow the posting guide, so we are
> > left to guess at what you have done. Here's a hint; give us (always)
> > the results of calling sessionInfo(), and give us a reproducible
> > example. The example you give below is not reproducible, as you are
> > the only one who has these data. If we can't do what you have done to
> > see what happened, how exactly are we to figure out why you get these
> > results?
> >
> > And by reproducible example I don't mean for you to give us a link to
> > your data. You can simply use set.seed(), randomly select some affy
> > IDs, and then go on from there as if these were the IDs you got from
> > an analysis.
> >
> >> In connection with my last mail, I have a other question :
> >>
> >> ##universe
> >> designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
> >> entrezUniverse <-
> >> na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
> >> ##myList
> >> selectedEntrezIds <-
> >> na.omit(unique(unlist(mget(mylist,hgu133plus2ENTREZID))))
> >> params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pvalueCutoff=0.05,conditional=TRUE,testDirection="over")
> >>
> >> params.MF.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="MF",pvalueCutoff=0.05,conditional=TRUE,testDirection="over")
> >>
> >> params.CC.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="CC",pvalueCutoff=0.05,conditional=TRUE,testDirection="over")
> >>
> >>
> >> hgOver.BP<-hyperGTest(params.BP.over)
> >> hgOver.MF<-hyperGTest(params.MF.over)
> >> hgOver.CC<-hyperGTest(params.CC.over)
> >>
> >> length(selectedEntrezIds)
> >> [1] 3761
> >> length(hgOver.BP at geneIds)
> >> [1] 3122
> >>
> >> I guess the hyperGTest has removed these 639 missing genes. But i
> >> don't understand why ?
> >
> > If you were to peruse the vignettes for the GOstats package, you will
> > see that there are two filtering steps that you should take before
> > running the analysis. The first is to subset to unique Entrez Gene IDs
> > as you have done. The second is to remove those Entrez Gene IDs that
> > have no GO annotations associated with them. My best guess is the 639
> > missing genes have no GO annotations. But I am guessing since I can't
> > know.
> >
> > Best,
> >
> > Jim
> >
> >
> >>
> >> Nicolas
> >>
> >> Robert Gentleman a écrit :
> >>> Hi Nicolas,
> >>>
> >>> Perhaps you could peruse the posting guide and provide the
> >>> information it asks you to. Only from that could one hope to give
> >>> you a reasonable answer (ie, what commands did you run, what what
> >>> the output, and your sessionInfo, at a bare minimum). And the
> >>> Bioconductor package is GOstats, not GOstat - that is something else
> >>> (not Bioconductor) and if you want help with it you should ask those
> >>> folks directly.
> >>>
> >>> Robert
> >>>
> >>>
> >>> nicolas servant wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have a question about the GOstat package.
> >>>> I'm working with a list of probesets, and i'm interested by testing
> >>>> the over representation of the GO terms in my list.
> >>>> So, I changed the probesets IDs (of my lists and my Universe) into
> >>>> ENTREZ IDs and the hyperGTest performed well.
> >>>> For instance i have some result as :
> >>>>
> >>>> GO:0005635
> >>>> Pvalue = 0.04
> >>>> OddsRatio = 0.04
> >>>> ExpCount = 11
> >>>> Count = 17
> >>>> Size = 45
> >>>>
> >>>> But, when i did the opposite :
> >>>>
> >>>> test<-mget("GO:0005635",hgu133plus2GO2ALLPROBES)
> >>>> entrez <-
> >>>> unique(unlist(mget(as.vector(unlist(test)),hgu133plus2ENTREZID)))
> >>>> length(entrez)
> >>>> [1] 126
> >>>>
> >>>> I don't understand why I find 126 entrez IDs in the Universe, and
> >>>> no 45 as expected (SIZE) ...
> >>>>
> >>>> mylistentrez<-unique(intersect(entrez,mylist))
> >>>> length(mylistentrez)
> >>>> [1] 50
> >>>>
> >>>> In the same way, I find 50 entrez IDs in my list, and no 17 as
> >>>> expected (COUNT)
> >>>>
> >>>> Thanks for your explanations,
> >>>> Bests,
> >>>>
> >>>> Nicolas
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>
> --
> Nicolas Servant
> Equipe Bioinformatique
> Institut Curie
> 26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
>
> Email: Nicolas.Servant at curie.fr
> Tel: 01 53 10 70 55
> http://bioinfo.curie.fr/
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list