[R] LDA with previous PCA for dimensionality reduction
Ramon Diaz-Uriarte
rdiaz at cnio.es
Fri Nov 26 11:38:51 CET 2004
Dear Cristoph, David, Torsten and Bjørn-Helge,
I think that Bjørn-Helge has made more explicit what I had in mind (which I
think is close also to what David mentioned). As well, at the very least, not
placing the PCA inside the cross-validation will underestimate the variance
in the predictions.
Best,
R.
On Thursday 25 November 2004 15:05, Bjørn-Helge Mevik wrote:
> Torsten Hothorn writes:
> > as long as one does not use the information in the response (the class
> > variable, in this case) I don't think that one ends up with an
> > optimistically biased estimate of the error
>
> I would be a little careful, though. The left-out sample in the
> LDA-cross-validation, will still have influenced the PCA used to build
> the LDA on the rest of the samples. The sample will have a tendency
> to lie closer to the centre of the "complete" PCA than of a PCA on the
> remaining samples. Also, if the sample has a high leverage on the
> PCA, the directions of the two PCAs can be quite different. Thus, the
> LDA is built on data that "fits" better to the left-out sample than if
> the sample was a completely new sample.
>
> I have no proofs or numerical studies showing that this gives
> over-optimistic error rates, but I would not recommend placing the PCA
> "outside" the cross-validation. (The same for any resampling-based
> validation.)
--
Ramón DÃaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)
More information about the R-help
mailing list