[BioC] GO over representation analysis

Tue Sep 2 17:37:11 CEST 2008

There is already a bovine.db0 package in the development branch of 
bioconductor.  I didn't call it cow.db0 because I didn't want any 
"bull.db0" in the repository.  (Please forgive the pun about "bull.db0", 
I just could not resist.)   ;)  A chicken.db0 database is already 
planned to be added as well.

As for the question about code for making your own source database, I 
would like to do that someday, but that particular project is really not 
nearly as straightforward as it sounds, and it is not at all clear to me 
that this is a project that many people will really want to use.  The 
vast majority of people who want gene annotations really want them for 
the small number of model organisms where most of the initial work 
describing gene function has actually been done (eg. humans, mouse, 
rats, flies etc.).  And most people who work on more exotic organisms 
are ok with the fact that there is usually not very much directly known 
about gene function from studies done in their own organism.  For this 
minority population of scientists, it really seems that better homology 
tools might serve them better in the long run.  And this is why we have 
created the new inparanoid packages to try and address this problem.

  Marc

michael watson (IAH-C) wrote:
> OK, so there is not chicken.db0 or cow.db0 package so that rules that
> one out. 
>
> My experience is that most people have a "star" schema of information
> linking probes to genes to GO IDs, Pathways, Entrez etc etc.
>
> It *should* be simple to make an annotation package out of this, but it
> isn't.
>
> And I hate to be hper-critical but the Vignette for AnnBuilder is
> virtually inaccessible...
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at med.umich.edu] 
> Sent: 29 August 2008 14:07
> To: Sean Davis
> Cc: michael watson (IAH-C); bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] GO over representation analysis
>
> There is one slight additional wrinkle. AnnotationDbi currently supports
>
> fewer species than AnnBuilder. Building a package using AnnotationDbi is
>
> (at least) a two step process, of which only one is required of the end 
> user. The first step(s) are to build a database containing all relevant 
> data for a given species that is then used to populate the chip-specific
>
> package databases.
>
> If you are interested in an annotation package that is not yet supported
>
> by AnnotationDbi, then you will need to consult with Marc Carlson to get
>
> the primary database built.
>
> Best,
>
> Jim
>
>
>
> Sean Davis wrote:
>   
>> On Fri, Aug 29, 2008 at 8:33 AM, michael watson (IAH-C)
>> <michael.watson at bbsrc.ac.uk> wrote:
>>     
>>> Dear All
>>>
>>> I think one thing that is frustrating here is that there is not a
>>>       
> simple
>   
>>> guide here for people who want to create an annotation package for an
>>> array that does not yet have one.
>>>
>>> Do we use AnnotationDbi?  Or AnnBuilder?  Or is there another way?
>>>
>>> What is the "best practice" for building an annotation package?
>>>       
>> Hi, Mick.  The confusion arises because annotation packages have been
>> migrated from the environment-based packages built by AnnBuilder to
>> the newer SQLite-based packages of AnnotationDbi.  The answer depends
>> on which version of R and, therefore, which version of Bioconductor
>> you are using.  That said, the standard for the current and future
>> releases (for the near-future, anyway) is to use SQLForge from
>> AnnotationDbi.
>>
>> Sean
>>
>>     
>>> -----Original Message-----
>>> From: bioconductor-bounces at stat.math.ethz.ch
>>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Marc
>>> Carlson
>>> Sent: 25 August 2008 17:03
>>> To: Heike Pospisil
>>> Cc: bioconductor at stat.math.ethz.ch
>>> Subject: Re: [BioC] GO over representation analysis
>>>
>>> You could just make an annotation package for the array in question
>>>       
> by
>   
>>> using the SQLForge code in the AnotationDbi package.
>>> You can find instructions on how to do this here:
>>>
>>> http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>>>
>>> Let me know if you have any questions about SQLForge.
>>>
>>>
>>>  Marc
>>>
>>>
>>>
>>> Heike Pospisil wrote:
>>>       
>>>> Hello Bioconductors,
>>>>
>>>> I am looking for a method to perfom over representation analysis
>>>>         
> (Gene
>   
>>>> Ontology) within R. I have data from the Maize Oligonucleotide Array
>>>>         
>>> (two
>>>       
>>>> channel) with the GO categories for all probes on this array. I have
>>>> clustered the genes using Maanova and I am interested in GO over
>>>> representation of the gene lists from these clusters.
>>>>
>>>> I know the GO tools from Bioconductor (e.g. GOstats), but I do not
>>>>         
>>> know how to
>>>       
>>>> adapt the analysis to an 'unusual' array with no annotation data
>>>>         
>>> package and
>>>       
>>>> now Entrez IDs. Any hints?
>>>>
>>>> Thanks in advance,
>>>> Heike
>>>>
>>>>         
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>>       
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>   
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>     
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>