[BioC] RMA question
Wolfgang Huber
huber at ebi.ac.uk
Sun Dec 17 20:53:14 CET 2006
Hi James,
this is a general problem of normalization methods that work by adapting
arrays in a set to themselves, and not to an independent reference.
Option 1 is indeed discredited when you want to get a fair estimate of
classification rates, since it does not faithfully simulate the real
application where you want to classify a new sample.
Option 2 does not work since f contains for each array a number of
array-specific, ideosyncratic parameters that reflect hybridization
conditions, labeling efficiency, RNA extraction etc. You cannot "learn"
them in advance.
The option I'd take is to look for a normalization method that
normalizes each new array individually (or in sets appropriate to your
intended application) to an existing database of reference arrays. I
know that various people on this list have been/are working on such
methods. But I am probably not up-to-date myself - maybe someone can
recommend?
Best wishes
Wolfgang
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
> Hi, I have a question for RMA normalization. Since RMA is an across
> sample
normalization, suppose I have 50 training samples (cel files) and 50
test samples (cel files). There are two ways to perform normalization:
> 1. Combine all the 100 samples together and use RMA to do
normalization. Then train the training set of 50 samples to classify the
50 test samples.
> 2. Use the 50 training samples to do RMA, then each cel file is
converted to gene expression vector. Suppose the mapping from cel file
to expression vector is:
> Expression = f(cel). The form of f is determined by the 50 training
cel files. Then apply the same mapping to the test cel files.
>
> I would think method 2 is more reasonable and trully blind. However,
it is not clear how to determine the function f from the 50 training cel
files. method 1 is easy to implement, but it is not trully blind, since
the normalization of cel files from training samples actually utilized
the information from test cel files.
> Could anybody tell me how to determine the function f from the 50
training cel files?
>
> Many thanks, James
More information about the Bioconductor
mailing list