[BioC] Invalid fold-filter

Robert Gentleman rgentlem at fhcrc.org
Fri Feb 17 20:15:08 CET 2006

Bornman, Daniel M wrote:
> Dear BioC Folks,
> As a bioinformatician within a Statistics department I often consult
> with real statisticians about the most appropriate test to apply to our
> microarray experiments.  One issue that is being debated among our
> statisticians is whether some types of fold-filtering may be invalid or
> biased in nature.  The types of fold-filtering in question are those
> that tend to NOT be non-specific.  
> Some filtering of a 54K probe affy chip is useful prior to making
> decisions on differential expression and there are many examples in the
> Bioconductor documentation (particularly in the {genefilter} package) on
> how to do so.  A popular method of non-specific filtering for reducing
> your probeset prior to applying statistics is to filter out low
> expressed probes followed by filtering out probes that do not show a
> minimum difference between quartiles.  These two steps are non-specific
> in that they do not take into consideration the actual samples/arrays.
> On the other hand, if we had two groups of samples, say control versus
> treated, and we filtered out those probes that do not have a mean
> difference in expression of 2-fold between the control and treated
> groups, this filtering was based on the actual samples.  This is NOT a
> non-specific filter.  The problem then comes (or rather the debate here
> arises) when a t-test is calculated for each probe that passed the
> sample-specific fold-filtering and the p-values are adjusted for
> multiple comparisons by, for example the Benjamini & Hochberg method.
> Is it valid to fold-filter using the sample identity as a criteria
> followed by correcting for multiple comparisons using just those probes
> that made it through the fold-filter?  When correcting for multiple
> comparisons you take a penalty for the number of comparison you are
> correcting.  The larger the pool of comparisons, the larger the penalty,
> thus the larger the adjusted p-value.  Or more importantly, the smaller
> the set, the less your adjusted p-value is adjusted (increased) relative
> to your raw p-value.  The argument is that you used the actual samples
> themselves you are comparing to unfairly reduce the adjusted p-value
> penalty.

  It is not valid to use phenotype to compute t-statistics for a 
particular phenotype and filter based on those p-values and to then use 
p-value correction methods on the result. I don't think we need 
research, it seems pretty obvious that this is not a valid approach.

   You can do non-specific filtering, but all you are really doing there 
is to remove genes that are inherently uninteresting no matter what the 
phenotype of the corresponding sample (if there is no variation in 
expression for a particular gene across samples then it has   no 
information about the phenotype of the sample). Filtering on low values 
is probably a bad idea although many do it (and I used to, and still do 
sometimes depending on the task at hand).

  Best wishes

> Has anyone considered this issue or heard of problems of using a
> specific type of filtering rather than a non-specific one?
> Thank You for any responses.
> Daniel Bornman
> Research Scientist
> Battelle Memorial Institute
> 505 King Ave
> Columbus, OH 43201
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
rgentlem at fhcrc.org

More information about the Bioconductor mailing list