[BioC] Filtering before differential expression analysis of microarrays - New paper out

Daniel Brewer daniel.brewer at icr.ac.uk
Thu Jan 15 11:49:32 CET 2009

Thanks for the brilliant answer.  Very interesting stuff.  The only
other question I would like to ask concerning this is when do you define
a probe as non-expressed?  Is this done by observation of some kind of
plot e.g. MA plot, a fixed percentage of probes or some absolute value
known by experience.  For Affy arrays you can use the DaBG results but I
am not sure what the correct approach would be with two colour microarrays.

Many thanks


Gordon Smyth wrote:
> Dear Dan,
> It's very common practice to keep all the probes for normalization, then
> to filter control probes and consistently non-expressed probes before
> differential expression analysis.  I recommend and do it this myself.
> It's such common practice that it's surprising to see a paper on it at
> this stage.
> It is in the spirit of normalization methods that all probes should be
> retained for normalization, except in unusual cases in which some probes
> are obviously poor quality for reasons other than expression level.
> At the differential expression step, probes can be usefully filtered out
> if they are not of any potential interest.  This means control probes,
> or probes which appear to be non-expressed across all conditions in the
> experiment, i.e., on all arrays. I have frequently complained on this
> mailing list about the practice of filtering individual low intensity
> probes on individual arrays, which IMO is a very destructive practice.
> If you filter a probe on the basis of expression, it must be filtered on
> all arrays.
> Filtering non-expressed probes tends not be emphasised on this list
> because users of this list are often sophisticated enough to use
> variance stabilizing normalization methods such as rma, vsn, normexp or
> vst.  This means that low-expression filtering is done more for
> multiplicity issues than for variance stabilization, and therefore often
> doesn't make a huge difference.  When using earlier normalization
> methods such as MAS for Affy or local background correction for
> two-color arrays, expression-filtering is absolutely essential, because
> the normalized expression values are so unstable at low intensity levels.
> To James, it is not necessary to give retain all the probes on the array
> for eBayes().  The only requirement is that eBayes() sees all the probes
> which are under consideration for differential expression.  So filtering
> out consistently non-expressed probes before linear modelling is
> generally a good idea.  In fact, filtering often improves the eBayes()
> assumptions. eBayes assumes that the residual variances are not
> intensity-dependent. However very lowly expressed probes often follow a
> mean-variance relationship which is somewhat different from the other
> probes, even after variance stabilization, in which case filtering will
> improve the constancy of variance assumption.  This tends not to be a
> big issue with rma-Affy data, but it is an important issue with
> vst-Illumina data for example.
> Best wishes
> Gordon


Daniel Brewer, Ph.D.

Institute of Cancer Research
Molecular Carcinogenesis
15 Cotswold Road
Sutton, Surrey SM2 5NG
United Kingdom

Tel: +44 (0) 20 8722 4109

Email: daniel.brewer at icr.ac.uk


The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:2}}

More information about the Bioconductor mailing list