[BioC] Filtering affymetrix data

Robert Gentleman rgentlem at fhcrc.org
Mon Oct 10 17:28:39 CEST 2005


Hi,

Teresa Casals wrote:
> Hello
> 
> I have been involved recently in analyzing some
> microarray experiments performed with affymetrix
> chips.
> 
> This task had been previously done by another analyst
> who left me some scripts, but no explanations.
> 
> The procedure she used to follow was first to
> normalize the arrays, say using rma and then, before
> doing any tests she used to apply two filters
> 
> - She kept only those genes whose signal was greater
> than a threshold on all arrays (she used "log(100)" as
> this threshold)
> - Assuming for simplicity that there were only two
> groups she applied a second filter keeping only those
> genes where the base-2 logarithm of the difference
> between the mean of the two groups was greater than
> 1.5
> 
> I think I understand the rationale under this
> procedure, but also I find it somewhat arbitrary.
> 

  Pretty much all filtering of genes is arbitrary. I don't think that 
there is a way out of that, unless you know a lot about the underlying 
biology. Some reduction of the genes that were assayed is necessary so 
you must choose some method.

We have found that it is better to filter on variability rather than 
level (although at one time I was a fan of filtering on level). Choose 
some (arbitrary amount of variability) and filter out those genes which 
do not show that amount of variability across samples.

You can see the second paper at
  http://www.bepress.com/bioconductor/
for some more detailed discussions of the issues.

  Robert


> Could someone please orient me about if this a
> usual/right way to proceed, or address to some
> references or examples which help to diminish the
> feeling of arbitrarity?
> 
> Thanks for your help
> 
> ========================
> Teresa Casals
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list