[R] LDA with previous PCA for dimensionality reduction

Fri Nov 26 11:38:51 CET 2004

Dear Cristoph, David, Torsten and BjÃ¸rn-Helge,

I think that BjÃ¸rn-Helge has made more explicit what I had in mind (which I 
think is close also to what David mentioned). As well, at the very least, not 
placing the PCA inside the cross-validation will underestimate the variance 
in the predictions.

Best,

R.

On Thursday 25 November 2004 15:05, BjÃ¸rn-Helge Mevik wrote:
> Torsten Hothorn writes:
> > as long as one does not use the information in the response (the class
> > variable, in this case) I don't think that one ends up with an
> > optimistically biased estimate of the error
>
> I would be a little careful, though.  The left-out sample in the
> LDA-cross-validation, will still have influenced the PCA used to build
> the LDA on the rest of the samples.  The sample will have a tendency
> to lie closer to the centre of the "complete" PCA than of a PCA on the
> remaining samples.  Also, if the sample has a high leverage on the
> PCA, the directions of the two PCAs can be quite different.  Thus, the
> LDA is built on data that "fits" better to the left-out sample than if
> the sample was a completely new sample.
>
> I have no proofs or numerical studies showing that this gives
> over-optimistic error rates, but I would not recommend placing the PCA
> "outside" the cross-validation.  (The same for any resampling-based
> validation.)

-- 
RamÃ³n DÃaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones OncolÃ³gicas (CNIO)
(Spanish National Cancer Center)
Melchor FernÃ¡ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)