[R] Splitting dataset for Tuning Parameter with Cross Validation

Max Kuhn mxkuhn at gmail.com
Mon Jul 13 16:37:36 CEST 2009


Here is what the train() function in the caret package does by default
(you can change this behavior; see below).

Using the entire data set, estimate the RBF parameter using the
sigest() function in the kernlab package (which, if I recall correctly
involves the median of a sample of kernel matrix values).

Using this fixed value, the cost function is varied over a common set
of held-out samples. More specifically, every value of the cost
parameter is evaluated on the same exact folds. I've been able to
achieve pretty good performance this way in almost every case where
I've done the comparison,

Based on these performance values, you can select the cost function
based on the best performance. There are also ways of selecting the
simplest model that is within the uncertainty of the numerically
optimal model (that is done using the selectionFunction argument of
trainControl).

I should also note that you can tune across any grid of cost and sigma
(this is done via the tuneGrid argument of train()).

Max




More information about the R-help mailing list