[BioC] (no subject)
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Jan 14 00:04:34 CET 2009
It's very common practice to keep all the probes for normalization, then
to filter control probes and consistently non-expressed probes before
differential expression analysis. I recommend and do it this myself.
It's such common practice that it's surprising to see a paper on it at
It is in the spirit of normalization methods that all probes should be
retained for normalization, except in unusual cases in which some probes
are obviously poor quality for reasons other than expression level.
At the differential expression step, probes can be usefully filtered out
if they are not of any potential interest. This means control probes, or
probes which appear to be non-expressed across all conditions in the
experiment, i.e., on all arrays. I have frequently complained on this
mailing list about the practice of filtering individual low intensity
probes on individual arrays, which IMO is a very destructive practice.
If you filter a probe on the basis of expression, it must be filtered on
Filtering non-expressed probes tends not be emphasised on this list
because users of this list are often sophisticated enough to use variance
stabilizing normalization methods such as rma, vsn, normexp or vst. This
means that low-expression filtering is done more for multiplicity issues
than for variance stabilization, and therefore often doesn't make a huge
difference. When using earlier normalization methods such as MAS for Affy
or local background correction for two-color arrays, expression-filtering
is absolutely essential, because the normalized expression values are so
unstable at low intensity levels.
To James, it is not necessary to give retain all the probes on the array
for eBayes(). The only requirement is that eBayes() sees all the probes
which are under consideration for differential expression. So filtering
out consistently non-expressed probes before linear modelling is generally
a good idea. In fact, filtering often improves the eBayes() assumptions.
eBayes assumes that the residual variances are not intensity-dependent.
However very lowly expressed probes often follow a mean-variance
relationship which is somewhat different from the other probes, even after
variance stabilization, in which case filtering will improve the constancy
of variance assumption. This tends not to be a big issue with rma-Affy
data, but it is an important issue with vst-Illumina data for example.
> Date: Mon, 12 Jan 2009 09:25:02 -0500
> From: "James W. MacDonald" <jmacdon at med.umich.edu>
> Subject: Re: [BioC] Filtering before differential expression analysis
> of microarrays - New paper out
> To: Daniel Brewer <daniel.brewer at icr.ac.uk>
> Cc: bioconductor at stat.math.ethz.ch
> Hi Dan,
> Daniel Brewer wrote:
>> There is a new paper out at BMC bioinformatics that seems to justify the
>> use of filtering before differential expression analysis is performed
>> (Hackstadt & Hess BMC Bioinformatics 2009, 10:11 -
>> http://www.biomedcentral.com/1471-2105/10/11/abstract). Specifically
>> filtering by variance and detection call. I have got the impression
>> from this list that the general opinion is that one should only filter
>> out the control genes before testing. I was wondering if anyone had any
>> opinions on this paper and the topic in general.
> I'm sure people do have opinions about this topic ;-D
> The reason people have so many opinions is because it isn't a simple
> question, and it depends on what you consider important.
> If you are just trying to limit the number of multiple comparisons to
> increase power, then filtering first is probably the way to go.
> If you are concerned with the accuracy of the FDR estimates, then
> filtering first may not be ideal.
> If you are using limma (Hackstadt and Hess used multtest), then you
> should filter after the eBayes step but before the FDR step, as an
> assumption of the eBayes step is that all of the data from the chip are
> Unless of course you are concerned about the accuracy of the FDR
> estimates, in which case... well you see the point.
> With microarray data analysis the arguments for and against a particular
> way of doing things can shed more heat than light, as nobody really
> knows the underlying truth, and the measures we use are really far
> removed from the actual phenomenon we are testing.
>> Many thanks
> James W. MacDonald, M.S.
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-5646
More information about the Bioconductor