[BioC] VSN, RMA, dCHIP, etc....

Sat Oct 27 19:06:28 CEST 2007

Dear Stefan,

have you seen Rafa's work on Affycomp? See 
http://affycomp.biostat.jhsph.edu and the two Bioinformatics papers 
cited there. There are some maps for the djungle.

In principle I second Tobias' point that the choices shouldn't make a 
big difference on the bottom line result if you have "good" data (and 
that could almost be seen as a definition of data quality). However, 
there is actually a reason for the variety of methods, which is that the 
following questions are actually hard to answer (and the best answers 
may be application specific):

- whether and how you use the MM values

- whether and how you do probe sequence specific background correction

- whether and how you weight probe signal in a sequence specific way

- where you want to be on the variance - bias tradeoff

Finally, a bigger issue than some of the gory variations in 
preprocessing methods may be the mapping of probes to target genes. The 
one you get from the manufacturer (and by extension, through our default 
CDF packages) is often not the best, and cross- or off-target 
hybridisation can be a problem.

Best wishes
   Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

> currently evaluating the performance of different normalization strategies 
> to an Affymetrix data set, I have some semi-technical, semi-philosophical 
> questions.
> 
> Given (i) the jungle of possible normalization strategies implemented in R 
> and other platforms, (ii) the fact that most authors describe which 
> normalization strategy they used but not why they chose this and no other, 
> (iii) the sparse literature on how to find the strategy most suitable for a 
> given design/experiment/data set, I would be very grateful for any comments 
> on the following questions:
> 
> 1) Are there written or silently accepted guidelines to evaluate, choose, 
> and justify the choice of normalization strategies?
> 
> 2) What could be sensible "readouts" for the performance of a given 
> normalization strategy ? (Personally, I am looking at the performance on 
> spike-in-control and a handful of known gene profiles. I am very intersted 
> in complementary approaches)
> 
> 3) Is there some literature on this issue that may have escaped my notice?
> 
> 
> Any comments on this issue would be highly appreciated.
> 
> Kind regards,
> 
> Stefan
>