[BioC] Gene Pre-filtering
Talloen, Willem [PRDBE]
WTALLOEN at PRDBE.JNJ.COM
Fri Jun 27 11:41:42 CEST 2008
I believe gene filtering is advisable as long as you do NOT USE THE LABELS of the arrays.
You should however always remain cautious not being too stringent; a low FDR is nice but useless if you excluded some of the interesting genes.
Another powerfull gene filtering method using probe level info for Affy chips is I/NI calls
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/21/2897
Willem
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of
> aoron at fhcrc.org
> Sent: Friday, 27 June 2008 02:23
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Gene Pre-filtering: My Two Shekels
>
>
> Hi all,
>
> Allow me to add my perspective as a relative newcomer into this field.
>
> At first I too was alarmed by the apparent violation of statistical
> orthodoxy involved in pre-filtering. But after witnessing how well
> this works on real data, my opinion has changed.
>
> I feel that either the statistician's perspective of p-values and
> inference or the data-miner's perspective of signal vs. noise and
> informative probes, may be misleading if taken in isolation.
>
> What has helped me is thinking of the original scientific
> problem. We
> have a large number of genes, belonging (roughly speaking) to three
> groups: differentially expressed, non-differentially expressed, and
> not expressed at all. Typically, our task is to identify the first
> group.
>
> Now, neglecting to pre-filter is equivalent to conflating the second
> and third groups (or, equivalently, assuming that the third
> group does
> not exist). Indeed, the current prevalent differential-expression
> methodology ignores the existence of 3 groups. This obviously
> leads to
> errors.
>
> Prefiltering via nsFilter or otherwise (e.g., the McClintick and
> Edenbert 2006 article referred to by Mark) is equivalent to
> trying to
> identify and remove the third group, and then use DE methodology to
> separate the first two. A more sophisticated version of prefiltering
> has been recently suggested by Calza et al. 2007:
>
> S. Calza, W. Raffelsberger, A. Ploner et al. Filtering genes to
> improve sensitivity in oligonucleodtide microarray data analysis.
> Nucleic Acids Research 35, #16, e102.
>
> I haven't tried this on any data yet, but they do have a
> home-grown R
> package available.
>
> My own gut feel is that much can be gained by looking at all
> 3 groups
> together and trying to distinguish between them in "one fell swoop".
> Once the problem is seen this way, we have all the
> pattern-recognition
> arsenal of machine learning at our disposal.
>
> Cheers, Assaf
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list