[BioC] GO.db and Gostats question

Marc Carlson mcarlson at fhcrc.org
Tue Sep 8 19:09:17 CEST 2009


Hi Adi,

I am reposting this reply to the list so that others may benefit from
any clarifications that we make here.  It's a good idea to keep things
on list in the future so that others can learn from your questions as well.

To answer your question: I am afraid that was not what I was trying to
say.  Sorry if I confused you.  The org.Hs.eg.db package can be used
instead of a human based array package.  But in that case,
"org.Hs.eg.db" would be your annotation argument when making your
GOHyperGParams object.  The point I was trying to make before was that
either an org package or a chip based annotation package is needed to
supply the GO to gene mapping.  But you need to supply one of those if
you are going to do this, because the GO.db package only contains the
Gene Ontology itself and not information about which GO IDs map to
particular genes. 

Hope this clarifies things,


  Marc



Tarca, Adi wrote:
>  
> Thanks Marc,
> So what you are saying is that the GO.db and org.Hs.eg.db does not contain all info required to map any human entrez id to all GO terms, and you really need the array package as well?
>
> Thanks,
> Adi
>
>
> -----Original Message-----
> From: Marc Carlson [mailto:mcarlson at fhcrc.org] 
> Sent: Thursday, September 03, 2009 6:01 PM
> To: Tarca, Adi
> Cc: 'bioconductor at stat.math.ethz.ch'
> Subject: Re: [BioC] GO.db and Gostats question
>
> Hi Adi,
>
> In addition to the gene list you provide, and the GO DAG, there is also information (from the annotation package that you specify) to indicate which GO terms should be associated with which gene IDs. 
>
>   Marc
>
>
>
>
> Tarca, Adi wrote:
>   
>> Hi all,
>>
>> I have a question regarding the way in which GOstats scans the GO tree and counts how many genes in a given list belong to those go terms.
>> Here is how I would look for all BP GO terms that have one or more genes in a given list called ALL:     
>>
>> library(org.Hs.eg.db)
>> library(GO.db)
>> ALL=c("6462","54784","386678","1572","57830","51764","85443","387644",
>> "54807","401106","11012","26529")
>> xyy<- as.list(org.Hs.egGO2ALLEGS)
>> x1 <- as.list(GOTERM)
>> alGO=names(xyy)
>> out<-list()
>> for (i in 1:length(alGO)){
>> mygo=alGO[i]
>> if(Ontology(x1[[mygo]])=="BP"){
>> genesgo=intersect(as.vector(xyy[[mygo]]),ALL)
>> out[[i]]<-genesgo
>> }
>> }
>> table(unlist(lapply(out,length)))
>>
>>
>> I am not concerned here with the speed of the process by merely finding the same BP terms that GOstats would find. I see that hyperGTest uses both an universeGeneIds argument but also an annotation argument, and I am not sure why the annotation argument is needed given that the universeGeneIds is provided.
>> Note that I can not get the list of all terms tested by GOstats using something like:
>>
>> library(GOstats)
>> params <- new("GOHyperGParams", geneIds = ALL,  universeGeneIds = ALL, 
>> annotation = "hgu133plus2.db",  ontology = "BP", pvalueCutoff = 1, 
>> conditional = FALSE,  testDirection = "over") hgCondOver <- 
>> hyperGTest(params)
>>
>> Because all p values will be exactly 1, hyperGTest will return no entries since none are less than the pvalueCutoff. 
>>
>>
>>
>>
>> Thanks a lot,
>>
>> Adi
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>   
>>     
>
>
>
>



More information about the Bioconductor mailing list