[BioC] Invalid fold-filter

Fri Feb 17 19:55:45 CET 2006

Dear BioC Folks,

As a bioinformatician within a Statistics department I often consult
with real statisticians about the most appropriate test to apply to our
microarray experiments.  One issue that is being debated among our
statisticians is whether some types of fold-filtering may be invalid or
biased in nature.  The types of fold-filtering in question are those
that tend to NOT be non-specific.  
Some filtering of a 54K probe affy chip is useful prior to making
decisions on differential expression and there are many examples in the
Bioconductor documentation (particularly in the {genefilter} package) on
how to do so.  A popular method of non-specific filtering for reducing
your probeset prior to applying statistics is to filter out low
expressed probes followed by filtering out probes that do not show a
minimum difference between quartiles.  These two steps are non-specific
in that they do not take into consideration the actual samples/arrays.
On the other hand, if we had two groups of samples, say control versus
treated, and we filtered out those probes that do not have a mean
difference in expression of 2-fold between the control and treated
groups, this filtering was based on the actual samples.  This is NOT a
non-specific filter.  The problem then comes (or rather the debate here
arises) when a t-test is calculated for each probe that passed the
sample-specific fold-filtering and the p-values are adjusted for
multiple comparisons by, for example the Benjamini & Hochberg method.
Is it valid to fold-filter using the sample identity as a criteria
followed by correcting for multiple comparisons using just those probes
that made it through the fold-filter?  When correcting for multiple
comparisons you take a penalty for the number of comparison you are
correcting.  The larger the pool of comparisons, the larger the penalty,
thus the larger the adjusted p-value.  Or more importantly, the smaller
the set, the less your adjusted p-value is adjusted (increased) relative
to your raw p-value.  The argument is that you used the actual samples
themselves you are comparing to unfairly reduce the adjusted p-value
penalty.
Has anyone considered this issue or heard of problems of using a
specific type of filtering rather than a non-specific one?
Thank You for any responses.

Daniel Bornman
Research Scientist
Battelle Memorial Institute
505 King Ave
Columbus, OH 43201