[BioC] GOstat with replicates
Robert Gentleman
rgentlem at fhcrc.org
Thu Sep 13 01:25:20 CEST 2007
Hi,
Amplifying a bit on this (and I am not so sure I yet understand
Naomi's use case), it seems likely that the issue here is not that one
needs duplicates in either the Universe or the gene set, but rather,
that in this case the naming scheme is not sufficient and one would like
to change it (so that different transcripts had some opportunity to be
identified).
This is possible, but it does reveal one of the weaknesses of our
current approach. We will need to move our GO annotation to a more
general mapping scheme (one based on the protein, not the gene), as it
is likely that different splice variants have different functions (and
hence different GO categorizations). It is still important to consider
whether those different splice variants (or other differences) can be
detected by the array (in the case of microarray analysis), and if not
then it will be important to map to the right level of resolution.
My guess is that we will be moving slowly in that direction over the
next year or so, and folks that have specific needs should let us know
what their use cases are.
best wishes
Robert
Seth Falcon wrote:
> Hi Naomi,
>
> Naomi Altman <naomi at stat.psu.edu> writes:
>
>> There are times when it makes sense to have genes duplicated in both
>> the universe and the set of interest - e.g. if the geneIds come from
>> BLAST hits of unigenes of an unsequenced species against the genes of
>> a sequenced species.
>>
>> I fiddled a bit with GOstat, but was not able to see how to change
>> the code to allow this. (I can see where duplication was removed in
>> the gene set but not in the universe.)
>> If someone could tell me where to look in the code, I would be happy
>> to contribute back the modified code allowing duplication.
>
> I think you will want to look in the Category package where a fair
> amount of the infrastructure is located for the GO-based hyperGTest.
>
> In particular, you may want to look at .makeValidParams in
> HyperGParams-accessors.R
>
> That said, I find the duplicated gene scenario hard to understand and
> would worry that the method as implemented won't give useful results.
>
>
> + seth
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list