[BioC] Gene Pre-filtering

Fri Jun 27 11:41:42 CEST 2008

I believe gene filtering is advisable as long as you do NOT USE THE LABELS of the arrays.
You should however always remain cautious not being too stringent; a low FDR is nice but useless if you excluded some of the interesting genes.

Another powerfull gene filtering method using probe level info for Affy chips is I/NI calls
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/21/2897

Willem

> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of
> aoron at fhcrc.org
> Sent: Friday, 27 June 2008 02:23
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Gene Pre-filtering: My Two Shekels
> 
> 
> Hi all,
> 
> Allow me to add my perspective as a relative newcomer into this field.
> 
> At first I too was alarmed by the apparent violation of statistical  
> orthodoxy involved in pre-filtering. But after witnessing how well  
> this works on real data, my opinion has changed.
> 
> I feel that either the statistician's perspective of p-values and  
> inference or the data-miner's perspective of signal vs. noise and  
> informative probes, may be misleading if taken in isolation.
> 
> What has helped me is thinking of the original scientific 
> problem. We  
> have a large number of genes, belonging (roughly speaking) to three  
> groups: differentially expressed, non-differentially expressed, and  
> not expressed at all. Typically, our task is to identify the first  
> group.
> 
> Now, neglecting to pre-filter is equivalent to conflating the second  
> and third groups (or, equivalently, assuming that the third 
> group does  
> not exist). Indeed, the current prevalent differential-expression  
> methodology ignores the existence of 3 groups. This obviously 
> leads to  
> errors.
> 
> Prefiltering via nsFilter or otherwise (e.g., the McClintick and  
> Edenbert 2006 article referred to by Mark) is equivalent to 
> trying to  
> identify and remove the third group, and then use DE methodology to  
> separate the first two. A more sophisticated version of prefiltering  
> has been recently suggested by Calza et al. 2007:
> 
> S. Calza, W. Raffelsberger, A. Ploner et al. Filtering genes to  
> improve sensitivity in oligonucleodtide microarray data analysis.  
> Nucleic Acids Research 35, #16, e102.
> 
> I haven't tried this on any data yet, but they do have a 
> home-grown R  
> package available.
> 
> My own gut feel is that much can be gained by looking at all 
> 3 groups  
> together and trying to distinguish between them in "one fell swoop".  
> Once the problem is seen this way, we have all the 
> pattern-recognition  
> arsenal of machine learning at our disposal.
> 
> Cheers, Assaf
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>