[BioC] classification issues - normalization and standardization
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Jul 19 17:53:08 CEST 2011
Hi Theresa,
On Tue, Jul 19, 2011 at 3:47 AM, Theresa Brandt
<theresabrandt80 at gmail.com> wrote:
> Hi Steve,
> Thank you very much for your help. Know it is clear for me. I can do the
> array normalization (like rma) on the whole data set. Then I have to split
> the dataset and I can do things like filtering of genes or gene
> standardization only on a training set.
> I was confused after reading a book "Bioconductor Case Studies". In the
> chapter about supervised machine learning they performed non-specific gene
> filtering and gene standardization on the whole dataset. But I would rather
> trust that you are right.
I wouldn't trust that I am right ... the people who wrote that book
have some serious credentials. :-)
There is arguably "lots" of things you can do to (all) of your data --
especially if you do not use the labels on your data as part of your
data preprocessing. I was just suggesting what I might do in your
situation is all. I never read the book you mentioned, though, but by
looking at folks who wrote it, I would imagine what they are doing in
that particular scenario is also valid.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list