[BioC] GOstat with replicates

Naomi Altman naomi at stat.psu.edu
Thu Sep 13 04:55:00 CEST 2007


I really do need to be able to have duplicate names in the gene set 
and the gene universe, with the restriction that for any gene, the 
number of copies in the gene universe must be greater than or equal 
the number in the gene set.  I will look at Category and see what I can do.

Thanks Robert and Seth

--Naomi

At 07:25 PM 9/12/2007, Robert Gentleman wrote:
>Hi,
>    Amplifying a bit on this (and I am not so sure I yet understand
>Naomi's use case), it seems likely that the issue here is not that one
>needs duplicates in either the Universe or the gene set, but rather,
>that in this case the naming scheme is not sufficient and one would like
>to change it (so that different transcripts had some opportunity to be
>identified).
>    This is possible, but it does reveal one of the weaknesses of our
>current approach.  We will need to move our GO annotation to a more
>general mapping scheme (one based on the protein, not the gene), as it
>is likely that different splice variants have different functions (and
>hence different GO categorizations).  It is still important to consider
>whether those different splice variants (or other differences) can be
>detected by the array (in the case of microarray analysis), and if not
>then it will be important to map to the right level of resolution.
>    My guess is that we will be moving slowly in that direction over the
>next year or so, and folks that have specific needs should let us know
>what their use cases are.
>
>    best wishes
>      Robert
>
>
>Seth Falcon wrote:
> > Hi Naomi,
> >
> > Naomi Altman <naomi at stat.psu.edu> writes:
> >
> >> There are times when it makes sense to have genes duplicated in both
> >> the universe and the set of interest - e.g. if the geneIds come from
> >> BLAST hits of unigenes of an unsequenced species against the genes of
> >> a sequenced species.
> >>
> >> I fiddled a bit with GOstat, but was not able to see how to change
> >> the code to allow this.  (I can see where duplication was removed in
> >> the gene set but not in the universe.)
> >> If someone could tell me where to look in the code, I would be happy
> >> to contribute back the modified code allowing duplication.
> >
> > I think you will want to look in the Category package where a fair
> > amount of the infrastructure is located for the GO-based hyperGTest.
> >
> > In particular, you may want to look at .makeValidParams in
> > HyperGParams-accessors.R
> >
> > That said, I find the duplicated gene scenario hard to understand and
> > would worry that the method as implemented won't give useful results.
> >
> >
> > + seth
> >
>
>--
>Robert Gentleman, PhD
>Program in Computational Biology
>Division of Public Health Sciences
>Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N, M2-B876
>PO Box 19024
>Seattle, Washington 98109-1024
>206-667-7700
>rgentlem at fhcrc.org
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list