[R] resampling syntax for caret package

Fri Apr 6 18:58:41 CEST 2012

Max and List,

Could you advise me if I am using the proper caret syntax to carry out
leave-one-out cross validation. In the example below, I use example
data from the rda package. I use caret to tune over a grid and select
an optimal value. I think I am then using the optimal selection for
prediction.  So there are two rounds of resampling with the first one
taken care of by caret's train function.

My question overall is that it seems I must carry the outer resampling
plan manually.

On another note, I usually get the warning

1: In train.default(colon.x[-holdout, ], outcome[-holdout], method = "pam",  :
  At least one of the class levels are not valid R variables names;
This may cause errors if class probabilities are generated because the
variables names will be converted to: X1, X2
2: executing %dopar% sequentially: no parallel backend registered

When I change the variable names, caret gives me predictions as a
numeric value corresponding to the ordered level. Have I missed
something here?

Thanks,

Juliet

# start example

library(caret)
# to obtain data
library(rda)

data(colon)

#  add colnames
myind <- seq(1:ncol(colon.x))
mynames <- paste("A",myind,sep="")
colnames(colon.x) <- mynames

outcome  <- factor(as.character(colon.y),levels=c("1","2"))

cv_index <- 1:length(outcome)
predictions <- rep(-1,length(cv_index))

pamGrid <- seq(0.1,5,by=0.2)
pamGrid <- data.frame(.threshold=pamGrid)

# manual leave-one-out
for (holdout in cv_index) {
pamFit1 <- train(colon.x[-holdout,], outcome[-holdout],
                 method = "pam",
                 tuneGrid= pamGrid,
                 trControl = trainControl(method = "cv"))

    predictions[holdout] = predict(pamFit1,newdata =
colon.x[holdout,,drop=FALSE])

}

# end example

> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] pamr_1.54        survival_2.36-12 e1071_1.6        class_7.3-3
 [5] rda_1.0.2        caret_5.15-023   foreach_1.3.5    codetools_0.2-8
 [9] iterators_1.0.5  cluster_1.14.2   reshape_0.8.4    plyr_1.7.1
[13] lattice_0.20-6

loaded via a namespace (and not attached):
[1] compiler_2.14.2 grid_2.14.2     tools_2.14.2