[R] help: pls package
Bjørn-Helge Mevik
bhx2 at mevik.net
Fri Jul 22 12:50:22 CEST 2005
wu sz writes:
> trainSet = as.data.frame(scale(trainSet, center = T, scale = T))
> trainSet.plsr = mvr(formula, ncomp = 14, data = trainSet, method = "kernelpls",
> CV = TRUE, validation = "LOO", model = TRUE, x = TRUE,
> y = TRUE)
[Two side notes here:
1) scaling of the data (with its sd) should be performed inside the
cross-validation. In the current version of 'pls', one can use
cvplsr <- crossval(plsr(y ~ scale(X), ncomp = 14, data = mydata),
length.seg = 1)
(However, 'crossval' is slower than the built-in cross-validation on
'mvr'/'plsr'. In the development version of the package, scaling
within the cross-validation has been implemented in the built-in
cross-validation. This will hopefully be published shortly.)
2) The 'CV' argument is from the earlier 'pls.pcr' package, and is no
longer used. It is silently ignored.]
> i = 1; msep_element = c()
> while(i <= length(p)){
> msep_element[,i] = (p[i]-y)^2
> i = i + 1
> }
Hmm... I don't see how you got that code to run. This should work, though:
msep_element <- (p - y)^2
> msep = colMeans(msep_element)
> msep_sd = sd(msep_element)
You will get much closer to the true value with
sd(msep_element) / sqrt(length(y))
However, this will not produce an unbiased estimate of the sd of the
estimated MSEP, because it ignores the depencies between the
residuals. E.g., the residual when sample 1 is predicted is not
independent of the residual when sample 2 is predicted. In general, I
think, it will produce underestimated sds. The effect should be
largest for small data sets.
This is the reason the pls package currently doesn't estimate se of
cross-validated MSEPs. There is also the question of what the
estimated should be conditioned on: for leave-one-out
cross-validation, sd(MSEP | trainData) = 0.
[If someone knows how to calculate unbiased estimates of
cross-validated MSEPs, please let me know. :-)]
--
Bjørn-Helge Mevik
More information about the R-help
mailing list