[BioC] Odds Ratio in GOstat [resolved?]
Naomi Altman
naomi at stat.psu.edu
Tue Dec 12 16:22:45 CET 2006
Dear Björn,
You have hit the nail on the head here. These
are plants, and we are pretty sure that there has been genome expansion.
The reliability of the unigene clustering is less
than 100%, of course, but in some cases we have
full length sequences so they are confirmed.
Thanks for your thoughts on this.
--Naomi
At 05:27 AM 12/12/2006, Björn Usadel wrote:
>Dear Naomi,
>
>
>if I understand you right, your problem seems to be, that you
>investigate the classifications of the best hits of the sequenced
>organism and not the classes of your actual ESTs.
>
>In this case, the route I usually take is to transfer the ontological
>terms onto the ESTs (or better unigenes) and use these for testing. (I
>use neither GO nor GOstats though).
> From a biological point of view I think this also makes sense. Just
>assume your sequenced species has one isoform of a particular enzyme
>(B), which has expanded to two isoforms (B1 and B2) already, which are
>not yet completely subfunctionalized etc. So in this case your
>non-sequenced organism really has two times GO:molecular_function:whatever.
>And also I am more interested in the distribution of genes the organism
>I am looking at than an already sequenced one. As an extreme case if you
>inferred GO terms by blasting plants against vertebrates, you will run
>into the problem of the super expanded gene families in plants (which
>are for real).
>
>So to answer your question I would say 3 out of 5.
>
>However, it is not trivial to transfer ontological terms especially if
>the original were already "inferred from electronic annotation". Also if
>you are not so sure about sequence clustering processes (e.g. ESTs B1
>and B2 should really represent one unigene) things start getting shaky.
>But there are annotation packages like Interpro2GO, blast2go and you
>name it.
>So to sum this up, I think you should rely on good old sequence based
>bioinformatics.
>
>Just my 5 cents though....
>
>Cheers,
>Björn
>
>Naomi Altman wrote:
> > The duplicate genes problem is an interesting one. The reason the
> > selected gene list includes duplicates is because it comes from
> > blasting an EST set from an unsequenced species against a sequenced
> > species. The duplicates are supposed to be the nearest homolog of
> > the EST but to represent multiple genes. How to handle this for GO
> > enrichment is an interesting question.
> >
> > e.g. Annotation has genes A B C.
> > We observe that matches A1 A2 and B1 are upregulated, but B2 and C
> > are not. Should we say that 3 out of 5 are upregulated, or 2 out of 3?
> >
> > --Naomi
> >
> > At 07:43 PM 12/11/2006, Seth Falcon wrote:
> >> The selected gene list contained duplicate ids. I'm pretty sure this
> >> is the problem. The Category + GOstats code should detect such input
> >> errors and give a sensible error message. I will add such checking
> >> very soon.
> >>
> >> + seth
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > Naomi S. Altman 814-865-3791 (voice)
> > Associate Professor
> > Dept. of Statistics 814-863-7114 (fax)
> > Penn State University 814-865-1348 (Statistics)
> > University Park, PA 16802-2111
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>--
>-+-+-+-+-+-+-+-+-+-+-+-
>Björn Usadel, PhD
>
>Max Planck Institute of Molecular Plant Physiology
>System Regulation Group
>
>Am Mühlenberg 1
>D-14476 Golm
>Germany
>
>Tel (+49 331) 567-8114
>
>Email usadel at mpimp-golm.mpg.de
>WWW mapman.mpimp-golm.mpg.de
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list