[BioC] Invalid fold-filter
Jenny Drnevich
drnevich at uiuc.edu
Mon Feb 20 18:05:34 CET 2006
Hello all,
I have also pondered over the issue of filtering genes to reduce the amount
of multiple hypothesis correction and what is or isn't valid statistically.
I do routinely filter on some estimate of "presence", either Affy's P/M/A
calls, or for spotted arrays, comparison to blanks, buffers and/or negative
controls. However, I only filter if a gene is not deemed "present" on all
of the arrays; my rationale for this is that if it's a whole-genome array,
only a subset of those genes will be expressed in any particular tissue,
developmental stage, etc. I keep a gene if it is "present" in at least one
sample rather than say, half the samples as I've seen in other analyses,
because the possibility exists that a gene may be expressed in only one of
the treatment groups.
On the other hand, I've never been comfortable with filtering on even a
non-specific measure of variation across arrays. After reading's Jim's
response, I agree that if you're mainly interested in sample
classification, then it could be reasonable to filter out genes that do not
vary, but it still doesn't seem right to do this if you're mainly
interested in determining differential expression between two or more known
classes. My reasoning is that the p-values are based on the null
F-distribution, and that by removing genes with little variance, you are in
effect removing the left side of the F-distribution, which would seem to
invalidate the p-values because the area under the remaining distribution
has changed. If you couldn't tell, my logic is not based on formal
statistical theory but rather on my intuitive feel on the matter!
Cheers,
Jenny
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list