[BioC] GOstats - defining the gene universe
James W. MacDonald
jmacdon at med.umich.edu
Fri Oct 5 18:24:14 CEST 2007
Hi Rachel,
Rachael McBride wrote:
> Hi,
>
> I have a quick question that I can't seem to find an answer to by
> searching the BioC lists. I want to use GOstats on a gene list. I've
> read the vignette and understand that defining the gene universe is an
> important step. The vignette outlines various non-specific filtering
> steps that can be done on an expression set in order to define the gene
> universe. My question is are the non-specific filtering steps done on a
> normalized or un-normalized expression set.
You would almost always want to use normalized expression data.
The vignette actually includes some steps that by all rights would have
occurred earlier in the analysis (namely the part where low-variance
genes are removed).
Usually the analysis proceeds something like this:
Preprocess - normalize, background correct, etc.
Filter 'uninteresting' genes to reduce multiplicity
Make comparisons
Do hypergeometric on the sets from the comparison step.
In this case the universe you would start with would be the data you
used to make the comparisons, which already lacks the genes you filtered
out because they were uninteresting by some measure. At this point you
simply want to remove any duplicates, genes lacking Entrez Gene IDs, and
genes lacking GO terms.
Best,
Jim
>
> Thanks,
> Rachael
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list