[BioC] Odds Ratio in GOstat [resolved?]

Tue Dec 12 19:41:11 CET 2006

On Tuesday 12 December 2006 12:38, Robert Gentleman wrote:
> Hi,
>    In principle (and I think in practice too) it is straightforward to
> modify GOstats (or any hypergeometric testing) to handle the situation
> where you believe that different ESTs represent different isoforms.
>
>    Basically you need to ensure that both the universe and the
> interesting gene list contain one value for all entities (ESTs here) of
> interest. Standard mapping to GO terms is via EntrezGene IDs (AFAIK) and
> so you cannot use them, you can however modify them, so that you get
> unique names for each EST (and keep the mapping to terms).
>    eg if EG X had three ESTs on my array, I might rename them X_1, X_2
> and X_3, and make sure that these are in my universe.
>
>    But I guess, if I think sequence is really that important, I would
> look at some sort of groupings other than GO.  I don't know, for example
> how well homology would work and I suspect that no one has done a
> comparative study. I also would worry about ISS annotations (in addition
> to IEA ones).

Aren't the GO annotations typically done against a protein, and not against a 
gene?  I think so, but someone else with more knowledge could comment?  That 
being the case, one could certainly blast the probe sequences against the 
proteins to determine a better sequence-based match.  However, if one 
searches the Gene Ontology.org database for a gene like "BRCA1", for example, 
one actually gets several hits (representing different proteins), all with 
slightly different ontology entries.  This phenomenon is likely due to a 
mixture of important biology and varying levels of evidence, making the 
exercise seem questionable at best.  

Sean