[BioC] Odds Ratio in GOstat [resolved?]
Sean Davis
sdavis2 at mail.nih.gov
Tue Dec 12 19:41:11 CET 2006
On Tuesday 12 December 2006 12:38, Robert Gentleman wrote:
> Hi,
> In principle (and I think in practice too) it is straightforward to
> modify GOstats (or any hypergeometric testing) to handle the situation
> where you believe that different ESTs represent different isoforms.
>
> Basically you need to ensure that both the universe and the
> interesting gene list contain one value for all entities (ESTs here) of
> interest. Standard mapping to GO terms is via EntrezGene IDs (AFAIK) and
> so you cannot use them, you can however modify them, so that you get
> unique names for each EST (and keep the mapping to terms).
> eg if EG X had three ESTs on my array, I might rename them X_1, X_2
> and X_3, and make sure that these are in my universe.
>
> But I guess, if I think sequence is really that important, I would
> look at some sort of groupings other than GO. I don't know, for example
> how well homology would work and I suspect that no one has done a
> comparative study. I also would worry about ISS annotations (in addition
> to IEA ones).
Aren't the GO annotations typically done against a protein, and not against a
gene? I think so, but someone else with more knowledge could comment? That
being the case, one could certainly blast the probe sequences against the
proteins to determine a better sequence-based match. However, if one
searches the Gene Ontology.org database for a gene like "BRCA1", for example,
one actually gets several hits (representing different proteins), all with
slightly different ontology entries. This phenomenon is likely due to a
mixture of important biology and varying levels of evidence, making the
exercise seem questionable at best.
Sean
More information about the Bioconductor
mailing list