[BioC] GOstat question

Sean Davis sdavis2 at mail.nih.gov
Fri Mar 21 15:02:46 CET 2008


On Fri, Mar 21, 2008 at 9:51 AM, nicolas servant
<Nicolas.Servant at curie.fr> wrote:
>  > sessionInfo()
>  R version 2.5.0 (2007-04-23)
>  i386-pc-solaris2.10
>
>  locale:
>  C
>
>  attached base packages:
>  [1] "splines"   "tools"     "stats"     "graphics"  "grDevices" "utils"
>  [7] "datasets"  "methods"   "base"
>
>  other attached packages:
>     GOstats    Category      Matrix     lattice  genefilter    survival
>     "2.2.6"     "2.2.3" "0.99875-2"    "0.15-5"    "1.14.1"      "2.31"
>        KEGG        RBGL    annotate     Biobase          GO       graph
>    "1.16.1"    "1.12.0"    "1.14.1"    "1.14.0"    "1.16.0"    "1.14.2"
>  hgu133plus2
>    "1.16.0"
>
>
>  Here, a *reproductible* example :
>
>  require(hgu133plus2)
>
>  ##UNIVERSE
>  designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
>  ##GO
>  haveGo <- sapply(mget(designALL, hgu133plus2GO),
>        function(x) {
>        if (length(x) == 1 && is.na(x)){
>       FALSE
>       }
>  else{
>     TRUE
>  }
>  })
>  numNoGO <- sum(!haveGo)
>  print(paste(numNoGO, "pbsets in Univers have no GO ids"))
>  designALL<-designALL[haveGo]
>
>  entrezUniverse <-
>  na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
>  if (any(duplicated(entrezUniverse))){
>    stop("error in gene universe: can't have duplicate Ent")
>  }
>
>  ##ALIST
>  selectedEntrezIds <- entrezUniverse[1:150]
>
>  require(GOstats)
>
>  params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pvalueCutoff=0.05,conditional=TRUE,testDirection="over")
>  hgOver.BP<-hyperGTest(params.BP.over)
>
>  length(selectedEntrezIds)
>  length(hgOver.BP at geneIds)
>
>  As you can see, i performed a non-specific filtering by removing the
>  probesets with no GO annotations (as presented in the vignette !).
>  I'm not interested here by using an other filtering approach like
>  Student test, IQR filtering, etc ...
>
>   > length(selectedEntrezIds)
>  [1] 150
>   > length(hgOver.BP at geneIds)
>  [1] 124
>
>  To summarize, all the entrezID of my list have a GO annotation, but the
>  hyperGTest function removes 26 additional genes.
>  I guess there are some explanations ?

All the entrezID of your list have a GO annotation; that does not
imply that they all have a GO BP annotation.  So, your difference of
26 genes is probably due to the lack of annotation within the GO BP
ontology.  I did not run your example to prove that is the case, but
it makes sense that the lists can, in general be different lengths.  I
do not see a problem here.

Sean

>  James W. MacDonald a écrit :
>  > Hi Nicolas,
>  >
>  > nicolas servant wrote:
>  >> Sorry. I use the BioC GOStats package (2.2.6).
>  >
>  > Yes, but you *still* refuse to follow the posting guide, so we are
>  > left to guess at what you have done. Here's a hint; give us (always)
>  > the results of calling sessionInfo(), and give us a reproducible
>  > example. The example you give below is not reproducible, as you are
>  > the only one who has these data. If we can't do what you have done to
>  > see what happened, how exactly are we to figure out why you get these
>  > results?
>  >
>  > And by reproducible example I don't mean for you to give us a link to
>  > your data. You can simply use set.seed(), randomly select some affy
>  > IDs, and then go on from there as if these were the IDs you got from
>  > an analysis.
>  >
>  >> In connection with my last mail, I have a other question :
>  >>
>  >> ##universe
>  >> designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
>  >> entrezUniverse <-
>  >> na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
>  >> ##myList
>  >> selectedEntrezIds <-
>  >> na.omit(unique(unlist(mget(mylist,hgu133plus2ENTREZID))))
>  >> params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pvalueCutoff=0.05,conditional=TRUE,testDirection="over")
>  >>
>  >> params.MF.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="MF",pvalueCutoff=0.05,conditional=TRUE,testDirection="over")
>  >>
>  >> params.CC.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="CC",pvalueCutoff=0.05,conditional=TRUE,testDirection="over")
>  >>
>  >>
>  >> hgOver.BP<-hyperGTest(params.BP.over)
>  >> hgOver.MF<-hyperGTest(params.MF.over)
>  >> hgOver.CC<-hyperGTest(params.CC.over)
>  >>
>  >> length(selectedEntrezIds)
>  >> [1] 3761
>  >> length(hgOver.BP at geneIds)
>  >> [1] 3122
>  >>
>  >> I guess the hyperGTest has removed these 639 missing genes. But i
>  >> don't understand why ?
>  >
>  > If you were to peruse the vignettes for the GOstats package, you will
>  > see that there are two filtering steps that you should take before
>  > running the analysis. The first is to subset to unique Entrez Gene IDs
>  > as you have done. The second is to remove those Entrez Gene IDs that
>  > have no GO annotations associated with them. My best guess is the 639
>  > missing genes have no GO annotations. But I am guessing since I can't
>  > know.
>  >
>  > Best,
>  >
>  > Jim
>  >
>  >
>  >>
>  >> Nicolas
>  >>
>  >> Robert Gentleman a écrit :
>  >>> Hi Nicolas,
>  >>>
>  >>>    Perhaps you could peruse the posting guide and provide the
>  >>> information it asks you to.  Only from that could one hope to give
>  >>> you a reasonable answer (ie, what commands did you run, what what
>  >>> the output, and your sessionInfo, at a bare minimum).  And the
>  >>> Bioconductor package is GOstats, not GOstat - that is something else
>  >>> (not Bioconductor) and if you want help with it you should ask those
>  >>> folks directly.
>  >>>
>  >>>   Robert
>  >>>
>  >>>
>  >>> nicolas servant wrote:
>  >>>
>  >>>> Hi,
>  >>>>
>  >>>> I have a question about the GOstat package.
>  >>>> I'm working with a list of probesets, and i'm interested by testing
>  >>>> the over representation of the GO terms in my list.
>  >>>> So, I changed the probesets IDs (of my lists and my Universe) into
>  >>>> ENTREZ IDs and the hyperGTest performed well.
>  >>>> For instance i have some result as :
>  >>>>
>  >>>> GO:0005635
>  >>>> Pvalue = 0.04
>  >>>> OddsRatio = 0.04
>  >>>> ExpCount =  11
>  >>>> Count = 17
>  >>>> Size = 45
>  >>>>
>  >>>> But, when i did the opposite :
>  >>>>
>  >>>> test<-mget("GO:0005635",hgu133plus2GO2ALLPROBES)
>  >>>> entrez <-
>  >>>> unique(unlist(mget(as.vector(unlist(test)),hgu133plus2ENTREZID)))
>  >>>> length(entrez)
>  >>>> [1] 126
>  >>>>
>  >>>> I don't understand why I find 126 entrez IDs in the Universe, and
>  >>>> no 45 as expected (SIZE) ...
>  >>>>
>  >>>> mylistentrez<-unique(intersect(entrez,mylist))
>  >>>> length(mylistentrez)
>  >>>> [1] 50
>  >>>>
>  >>>> In the same way, I find 50 entrez IDs in my list, and no 17 as
>  >>>> expected (COUNT)
>  >>>>
>  >>>> Thanks for your explanations,
>  >>>> Bests,
>  >>>>
>  >>>> Nicolas
>  >>>>
>  >>>>
>  >>>
>  >>
>  >>
>  >
>
>
>  --
>  Nicolas Servant
>  Equipe Bioinformatique
>  Institut Curie
>  26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
>
>  Email: Nicolas.Servant at curie.fr
>  Tel: 01 53 10 70 55
>  http://bioinfo.curie.fr/
>
>  _______________________________________________
>  Bioconductor mailing list
>  Bioconductor at stat.math.ethz.ch
>  https://stat.ethz.ch/mailman/listinfo/bioconductor
>  Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


More information about the Bioconductor mailing list