[BioC] VSN, RMA, dCHIP, etc....
Wolfgang Huber
huber at ebi.ac.uk
Sat Oct 27 19:06:28 CEST 2007
Dear Stefan,
have you seen Rafa's work on Affycomp? See
http://affycomp.biostat.jhsph.edu and the two Bioinformatics papers
cited there. There are some maps for the djungle.
In principle I second Tobias' point that the choices shouldn't make a
big difference on the bottom line result if you have "good" data (and
that could almost be seen as a definition of data quality). However,
there is actually a reason for the variety of methods, which is that the
following questions are actually hard to answer (and the best answers
may be application specific):
- whether and how you use the MM values
- whether and how you do probe sequence specific background correction
- whether and how you weight probe signal in a sequence specific way
- where you want to be on the variance - bias tradeoff
Finally, a bigger issue than some of the gory variations in
preprocessing methods may be the mapping of probes to target genes. The
one you get from the manufacturer (and by extension, through our default
CDF packages) is often not the best, and cross- or off-target
hybridisation can be a problem.
Best wishes
Wolfgang
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
> currently evaluating the performance of different normalization strategies
> to an Affymetrix data set, I have some semi-technical, semi-philosophical
> questions.
>
> Given (i) the jungle of possible normalization strategies implemented in R
> and other platforms, (ii) the fact that most authors describe which
> normalization strategy they used but not why they chose this and no other,
> (iii) the sparse literature on how to find the strategy most suitable for a
> given design/experiment/data set, I would be very grateful for any comments
> on the following questions:
>
> 1) Are there written or silently accepted guidelines to evaluate, choose,
> and justify the choice of normalization strategies?
>
> 2) What could be sensible "readouts" for the performance of a given
> normalization strategy ? (Personally, I am looking at the performance on
> spike-in-control and a handful of known gene profiles. I am very intersted
> in complementary approaches)
>
> 3) Is there some literature on this issue that may have escaped my notice?
>
>
> Any comments on this issue would be highly appreciated.
>
> Kind regards,
>
> Stefan
>
More information about the Bioconductor
mailing list