[R] resampling syntax for caret package
Juliet Hannah
juliet.hannah at gmail.com
Fri Apr 6 18:58:41 CEST 2012
Max and List,
Could you advise me if I am using the proper caret syntax to carry out
leave-one-out cross validation. In the example below, I use example
data from the rda package. I use caret to tune over a grid and select
an optimal value. I think I am then using the optimal selection for
prediction. So there are two rounds of resampling with the first one
taken care of by caret's train function.
My question overall is that it seems I must carry the outer resampling
plan manually.
On another note, I usually get the warning
1: In train.default(colon.x[-holdout, ], outcome[-holdout], method = "pam", :
At least one of the class levels are not valid R variables names;
This may cause errors if class probabilities are generated because the
variables names will be converted to: X1, X2
2: executing %dopar% sequentially: no parallel backend registered
When I change the variable names, caret gives me predictions as a
numeric value corresponding to the ordered level. Have I missed
something here?
Thanks,
Juliet
# start example
library(caret)
# to obtain data
library(rda)
data(colon)
# add colnames
myind <- seq(1:ncol(colon.x))
mynames <- paste("A",myind,sep="")
colnames(colon.x) <- mynames
outcome <- factor(as.character(colon.y),levels=c("1","2"))
cv_index <- 1:length(outcome)
predictions <- rep(-1,length(cv_index))
pamGrid <- seq(0.1,5,by=0.2)
pamGrid <- data.frame(.threshold=pamGrid)
# manual leave-one-out
for (holdout in cv_index) {
pamFit1 <- train(colon.x[-holdout,], outcome[-holdout],
method = "pam",
tuneGrid= pamGrid,
trControl = trainControl(method = "cv"))
predictions[holdout] = predict(pamFit1,newdata =
colon.x[holdout,,drop=FALSE])
}
# end example
> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] pamr_1.54 survival_2.36-12 e1071_1.6 class_7.3-3
[5] rda_1.0.2 caret_5.15-023 foreach_1.3.5 codetools_0.2-8
[9] iterators_1.0.5 cluster_1.14.2 reshape_0.8.4 plyr_1.7.1
[13] lattice_0.20-6
loaded via a namespace (and not attached):
[1] compiler_2.14.2 grid_2.14.2 tools_2.14.2
More information about the R-help
mailing list