[BioC] Odds Ratio in GOstat [resolved?]

Naomi Altman naomi at stat.psu.edu
Tue Dec 12 16:22:45 CET 2006


Dear Björn,

You have hit the nail on the head here.  These 
are plants, and we are pretty sure that there has been genome expansion.
The reliability of the unigene clustering is less 
than 100%, of course, but in some cases we have 
full length sequences so they are confirmed.

Thanks for your thoughts on this.

--Naomi

At 05:27 AM 12/12/2006, Björn Usadel wrote:
>Dear Naomi,
>
>
>if I understand you right, your problem seems to be, that you
>investigate  the classifications of the best hits of the sequenced
>organism and not the classes of your actual ESTs.
>
>In this case, the route I usually take is to transfer the ontological
>terms onto the ESTs (or better unigenes) and use these for testing. (I
>use neither GO nor GOstats though).
>  From a biological point of view I think this also makes sense. Just
>assume your sequenced species has one isoform of a particular enzyme
>(B), which has expanded to two isoforms (B1 and B2) already, which are
>not yet completely subfunctionalized etc. So in this case your
>non-sequenced organism really has two times GO:molecular_function:whatever.
>And also I am more interested in the distribution of genes the organism
>I am looking at than an already sequenced one. As an extreme case if you
>inferred GO terms by blasting plants against vertebrates, you will run
>into the problem of the super expanded gene families in plants (which
>are for real).
>
>So to answer your question I would say 3 out of 5.
>
>However, it is not trivial to transfer ontological terms especially if
>the original were already "inferred from electronic annotation". Also if
>you are not so sure about sequence clustering processes (e.g. ESTs B1
>and B2 should really represent one unigene) things start getting shaky.
>But there are annotation packages like Interpro2GO, blast2go and you
>name it.
>So to sum this up, I think you should rely on good old sequence based
>bioinformatics.
>
>Just my 5 cents though....
>
>Cheers,
>Björn
>
>Naomi Altman wrote:
> > The duplicate genes problem is an interesting one.  The reason the
> > selected gene list includes duplicates is because it comes from
> > blasting an EST set from an unsequenced species against a sequenced
> > species.  The duplicates are supposed to be the nearest homolog of
> > the EST but to represent multiple genes.  How to handle this for GO
> > enrichment is an interesting question.
> >
> > e.g.  Annotation has genes A B C.
> > We observe that matches A1 A2 and B1 are upregulated, but  B2 and C
> > are not.  Should we say that 3 out of 5 are upregulated, or 2 out of 3?
> >
> > --Naomi
> >
> > At 07:43 PM 12/11/2006, Seth Falcon wrote:
> >> The selected gene list contained duplicate ids.  I'm pretty sure this
> >> is the problem.  The Category + GOstats code should detect such input
> >> errors and give a sensible error message.  I will add such checking
> >> very soon.
> >>
> >> + seth
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > Naomi S. Altman                                814-865-3791 (voice)
> > Associate Professor
> > Dept. of Statistics                              814-863-7114 (fax)
> > Penn State University                         814-865-1348 (Statistics)
> > University Park, PA 16802-2111
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>--
>-+-+-+-+-+-+-+-+-+-+-+-
>Björn Usadel, PhD
>
>Max Planck Institute of Molecular Plant Physiology
>System Regulation Group
>
>Am Mühlenberg 1
>D-14476 Golm
>Germany
>
>Tel    (+49 331) 567-8114
>
>Email  usadel at mpimp-golm.mpg.de
>WWW    mapman.mpimp-golm.mpg.de
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list