[BioC] Invalid fold-filter
Bornman, Daniel M
bornmand at BATTELLE.ORG
Fri Feb 17 20:34:01 CET 2006
I of course agree that filtering on a variable (phenotype) that will be
used later to calculate adjusted p-values is flawed and therefore it is
not a method I would implement; however, it seems that many that
describe fold-filtering are doing just that.
Thank you for your response.
-----Original Message-----
From: Robert Gentleman [mailto:rgentlem at fhcrc.org]
Sent: Friday, February 17, 2006 2:15 PM
To: Bornman, Daniel M
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Invalid fold-filter
Bornman, Daniel M wrote:
> Dear BioC Folks,
>
> As a bioinformatician within a Statistics department I often consult
> with real statisticians about the most appropriate test to apply to
> our microarray experiments. One issue that is being debated among our
> statisticians is whether some types of fold-filtering may be invalid
> or biased in nature. The types of fold-filtering in question are
> those that tend to NOT be non-specific.
> Some filtering of a 54K probe affy chip is useful prior to making
> decisions on differential expression and there are many examples in
> the Bioconductor documentation (particularly in the {genefilter}
> package) on how to do so. A popular method of non-specific filtering
> for reducing your probeset prior to applying statistics is to filter
> out low expressed probes followed by filtering out probes that do not
> show a minimum difference between quartiles. These two steps are
> non-specific in that they do not take into consideration the actual
samples/arrays.
> On the other hand, if we had two groups of samples, say control versus
> treated, and we filtered out those probes that do not have a mean
> difference in expression of 2-fold between the control and treated
> groups, this filtering was based on the actual samples. This is NOT a
> non-specific filter. The problem then comes (or rather the debate
> here
> arises) when a t-test is calculated for each probe that passed the
> sample-specific fold-filtering and the p-values are adjusted for
> multiple comparisons by, for example the Benjamini & Hochberg method.
> Is it valid to fold-filter using the sample identity as a criteria
> followed by correcting for multiple comparisons using just those
> probes that made it through the fold-filter? When correcting for
> multiple comparisons you take a penalty for the number of comparison
> you are correcting. The larger the pool of comparisons, the larger
> the penalty, thus the larger the adjusted p-value. Or more
> importantly, the smaller the set, the less your adjusted p-value is
> adjusted (increased) relative to your raw p-value. The argument is
> that you used the actual samples themselves you are comparing to
> unfairly reduce the adjusted p-value penalty.
It is not valid to use phenotype to compute t-statistics for a
particular phenotype and filter based on those p-values and to then use
p-value correction methods on the result. I don't think we need
research, it seems pretty obvious that this is not a valid approach.
You can do non-specific filtering, but all you are really doing there
is to remove genes that are inherently uninteresting no matter what the
phenotype of the corresponding sample (if there is no variation in
expression for a particular gene across samples then it has no
information about the phenotype of the sample). Filtering on low values
is probably a bad idea although many do it (and I used to, and still do
sometimes depending on the task at hand).
Best wishes
Robert
> Has anyone considered this issue or heard of problems of using a
> specific type of filtering rather than a non-specific one?
> Thank You for any responses.
>
> Daniel Bornman
> Research Scientist
> Battelle Memorial Institute
> 505 King Ave
> Columbus, OH 43201
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list