[BioC] GOstats - defining the gene universe

Mon Oct 8 09:51:42 CEST 2007

James W. MacDonald wrote:
> Hi Rachel,
> 
> Rachael McBride wrote:
>> Hi,
>>
>> I have a quick question that I can't seem to find an answer to by 
>> searching the BioC lists. I want to use GOstats on a gene list. I've 
>> read the vignette and understand that defining the gene universe is an 
>> important step. The vignette outlines various non-specific filtering 
>> steps that can be done on an expression set in order to define the 
>> gene universe. My question is are the non-specific filtering steps 
>> done on a normalized or un-normalized expression set.
> 
> You would almost always want to use normalized expression data.
> 
> The vignette actually includes some steps that by all rights would have 
> occurred earlier in the analysis (namely the part where low-variance 
> genes are removed).
> 
> Usually the analysis proceeds something like this:
> 
> Preprocess - normalize, background correct, etc.
> Filter 'uninteresting' genes to reduce multiplicity
> Make comparisons
> Do hypergeometric on the sets from the comparison step.
> 
> In this case the universe you would start with would be the data you 
> used to make the comparisons, which already lacks the genes you filtered 
> out because they were uninteresting by some measure. At this point you 
> simply want to remove any duplicates, genes lacking Entrez Gene IDs, and 
> genes lacking GO terms.
> 
> Best,
> 
> Jim
> 
> 
> 
>>
>> Thanks,
>> Rachael
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
Hi Jim,

Thanks for the clarification,

Rachael.