[R] LDA with previous PCA for dimensionality reduction

Ramon Diaz-Uriarte rdiaz at cnio.es
Fri Nov 26 11:38:51 CET 2004

Dear Cristoph, David, Torsten and Bjørn-Helge,

I think that Bjørn-Helge has made more explicit what I had in mind (which I 
think is close also to what David mentioned). As well, at the very least, not 
placing the PCA inside the cross-validation will underestimate the variance 
in the predictions.



On Thursday 25 November 2004 15:05, Bjørn-Helge Mevik wrote:
> Torsten Hothorn writes:
> > as long as one does not use the information in the response (the class
> > variable, in this case) I don't think that one ends up with an
> > optimistically biased estimate of the error
> I would be a little careful, though.  The left-out sample in the
> LDA-cross-validation, will still have influenced the PCA used to build
> the LDA on the rest of the samples.  The sample will have a tendency
> to lie closer to the centre of the "complete" PCA than of a PCA on the
> remaining samples.  Also, if the sample has a high leverage on the
> PCA, the directions of the two PCAs can be quite different.  Thus, the
> LDA is built on data that "fits" better to the left-out sample than if
> the sample was a completely new sample.
> I have no proofs or numerical studies showing that this gives
> over-optimistic error rates, but I would not recommend placing the PCA
> "outside" the cross-validation.  (The same for any resampling-based
> validation.)

Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

PGP KeyID: 0xE89B3462

More information about the R-help mailing list