[BioC] Filtering before differential expression analysis of microarrays - New paper out
Gordon Smyth
smyth at wehi.EDU.AU
Wed Jan 14 00:09:30 CET 2009
Dear Dan,
It's very common practice to keep all the probes for normalization,
then to filter control probes and consistently non-expressed probes
before differential expression analysis. I recommend and do it this
myself. It's such common practice that it's surprising to see a paper
on it at this stage.
It is in the spirit of normalization methods that all probes should
be retained for normalization, except in unusual cases in which some
probes are obviously poor quality for reasons other than expression level.
At the differential expression step, probes can be usefully filtered
out if they are not of any potential interest. This means control
probes, or probes which appear to be non-expressed across all
conditions in the experiment, i.e., on all arrays. I have frequently
complained on this mailing list about the practice of filtering
individual low intensity probes on individual arrays, which IMO is a
very destructive practice. If you filter a probe on the basis of
expression, it must be filtered on all arrays.
Filtering non-expressed probes tends not be emphasised on this list
because users of this list are often sophisticated enough to use
variance stabilizing normalization methods such as rma, vsn, normexp
or vst. This means that low-expression filtering is done more for
multiplicity issues than for variance stabilization, and therefore
often doesn't make a huge difference. When using earlier
normalization methods such as MAS for Affy or local background
correction for two-color arrays, expression-filtering is absolutely
essential, because the normalized expression values are so unstable
at low intensity levels.
To James, it is not necessary to give retain all the probes on the
array for eBayes(). The only requirement is that eBayes() sees all
the probes which are under consideration for differential
expression. So filtering out consistently non-expressed probes
before linear modelling is generally a good idea. In fact, filtering
often improves the eBayes() assumptions. eBayes assumes that the
residual variances are not intensity-dependent. However very lowly
expressed probes often follow a mean-variance relationship which is
somewhat different from the other probes, even after variance
stabilization, in which case filtering will improve the constancy of
variance assumption. This tends not to be a big issue with rma-Affy
data, but it is an important issue with vst-Illumina data for example.
Best wishes
Gordon
>Date: Mon, 12 Jan 2009 09:25:02 -0500
>From: "James W. MacDonald" <jmacdon at med.umich.edu>
>Subject: Re: [BioC] Filtering before differential expression analysis
> of microarrays - New paper out
>To: Daniel Brewer <daniel.brewer at icr.ac.uk>
>Cc: bioconductor at stat.math.ethz.ch
>
>Hi Dan,
>
>Daniel Brewer wrote:
>>Hi,
>>
>>There is a new paper out at BMC bioinformatics that seems to justify the
>>use of filtering before differential expression analysis is performed
>>(Hackstadt & Hess BMC Bioinformatics 2009, 10:11 -
>>http://www.biomedcentral.com/1471-2105/10/11/abstract). Specifically
>>filtering by variance and detection call. I have got the impression
>>from this list that the general opinion is that one should only filter
>>out the control genes before testing. I was wondering if anyone had any
>>opinions on this paper and the topic in general.
>
>I'm sure people do have opinions about this topic ;-D
>
>The reason people have so many opinions is because it isn't a simple
>question, and it depends on what you consider important.
>
>If you are just trying to limit the number of multiple comparisons to
>increase power, then filtering first is probably the way to go.
>
>If you are concerned with the accuracy of the FDR estimates, then
>filtering first may not be ideal.
>
>If you are using limma (Hackstadt and Hess used multtest), then you
>should filter after the eBayes step but before the FDR step, as an
>assumption of the eBayes step is that all of the data from the chip are
>available.
>
>Unless of course you are concerned about the accuracy of the FDR
>estimates, in which case... well you see the point.
>
>With microarray data analysis the arguments for and against a particular
>way of doing things can shed more heat than light, as nobody really
>knows the underlying truth, and the measures we use are really far
>removed from the actual phenomenon we are testing.
>
>Best,
>
>Jim
>
>
>>
>>Many thanks
>>
>>Dan
>
>--
>James W. MacDonald, M.S.
>Biostatistician
>Hildebrandt Lab
>8220D MSRB III
>1150 W. Medical Center Drive
>Ann Arbor MI 48109-5646
>734-936-8662
More information about the Bioconductor
mailing list