hi Seth,

i will try to write a function that overwrites your getGoToEntrezMap so i
can use it for my analyses. i will send you the code once i've written it,
so you may use it if you want to. why i'm not using AnnBuilder, is that i'm
collecting all the annotation in our annotation database (using
RdbiPgSQL/maDB), and this annotation is updated automatically.

sincerely, jo

On 6/4/07, Seth Falcon <sfalcon@fhcrc.org> wrote:
>
> Hi Johannes,
>
> You are right that the current Category/GOstats implementations rely
> on Bioconductor annotation data packages being available.  Taking the
> time to generate an annotation data package using AnnBuilder would
> have other benefits aside from being able to use the GOstats code, but
> I can sympathize with wanting a way to use these tools without going
> through that step first.
>
> I'm not opposed to the idea of finding a way to let the GOstats tools
> operate without an annotation data package, but at present won't have
> time to implement anything (what is there now suits our needs fairly
> well).  So patches are welcome. :-)
>
> "Johannes Rainer" <johannes.rainer@tcri.at> writes:
> > thanks for your suggestion, this would be a solution,
> > but as far as i understand the functions from the GOstats and Category
> > packages map each time the hyperGTest function is called the submitted
> ids
> > to GO terms using the annotation packages (i.e. hgu133plus2 annotation
> > packages). actually the mapping is performed in the getGoToEntrezMap
> > function (Category package), and this function maps EntrezGene IDs to GO
> > terms by first mapping affy IDs to GO terms and then affy IDs to
> EntrezGene
> > IDs.
>
> Yes, the mapping is recomputed for each call and this could probably
> be improved.  Indeed, as we transition to SQLite-based annotation data
> packages, many of the contortions of the current code can be avoided
> entirely.  I'm not sure we can avoid computing the mapping for each
> call because we need to filter the mapping based on the provided list
> of gene IDs.
>
> > when i submit the EntrezGene IDs of the selected genes and those of the
> gene
> > universe, i would not need the information from the annotation packages
> that
> > map affy ids to entrezgene ids and affy ids to GO terms. the mapping
> between
> > GO terms and EntrezGene IDs can be performed using the GO package
> > i.e.
> >
> >     GOLL <- as.list(get("GOALLENTREZID",mode="environment"))
> >     GOLL <- GOLL[!is.na(GOLL)] # just removing all the GO ids that are
> not
> > mapped to any EntrezGene ID
> >     PresentGO <- sapply(GOLL,function(z){
> >         if(is.na(z) || length(z)==0)
> >             return(FALSE)
> >         any(x %in% z)            # x are EntrezGene IDs, either from the
> > gene universe or the selected ones
> >         }
> >     )
> >
> >    GOLL <- GOLL[PresentGO]
> >
> > GOLL is than a list of all GO terms for the EntrezGene IDs specified
> with x
> > (containing all ontologies, MF, CC and BP)
>
> Aside:
>
>   The GOALLENTREZID map should probably be replaced with organism
>   and ontology specific maps.  The current map is huge and if we were
>   to use it as you are suggesting, I suspect it would be even slower
>   than the current map genertion to go through and selected the
>   desired ontology, eliminate GO IDs with no annotations in the
>   selected gene list, etc.
>
> --
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research
> Center
> http://bioconductor.org
>



-- 
Johannes Rainer, Msc
Tyrolean Cancer Research Institute
Innrain 66, 6020 Innsbruck, Austria
Tel.: +43 512 570485 33
Email: johannes.rainer@tcri.at
          johannes.rainer@tugraz.at

	[[alternative HTML version deleted]]

