[BioC] "automatic association analysis"

Francois Pepin fpepin at cs.mcgill.ca
Fri Aug 25 20:09:07 CEST 2006


Hi Weiwei

> My initial question is about
> how to automatic "validate" or "test" the result I get from whatever
> methods i use, like text mining or something like that.

I think some packages may exist, but we do that by hand. Once we're
pointed to a specific pathway, we prefer to let humans handle the rest.

> how do u define the "success events" in hypergeometric test? and how
> do you make sure the sampling has no bias when you pick genes in your
> study?

That's one of the tricky issues. People usually use differentially
expressed genes, but putting a threshold there isn't obvious. One of the
reasons some people do not like it (and I'm starting to feel the same
way) is that the values are very continuous such that changing the
threshold by a hair changes your set of genes (often changing your
results significantly.

I'm not sure what you mean about the sampling bias. If you filter in an
unbiased way and set your universe to be what is available on the chip
you should be ok. You should also deal with duplicate probes (if any)
and duplicate probes per genes (if any). Again the archives have a
couple of fairly detailed discussions on those issues.

Francois



More information about the Bioconductor mailing list