[R] caret: Error when using rpart and CV != LOOCV
Max Kuhn
mxkuhn at gmail.com
Wed May 16 17:30:58 CEST 2012
More information is needed to be sure, but it is most likely that some
of the resampled rpart models produce the same prediction for the
hold-out samples (likely the result of no viable split being found).
Almost every incarnation of R^2 requires the variance of the
prediction. This particular failure mode would result in a divide by
zero.
Try using you own summary function (see ?trainControl) and put a
print(summary(data$pred)) in there to verify my claim.
Max
On Wed, May 16, 2012 at 11:30 AM, Max Kuhn <mxkuhn at gmail.com> wrote:
> More information is needed to be sure, but it is most likely that some
> of the resampled rpart models produce the same prediction for the
> hold-out samples (likely the result of no viable split being found).
>
> Almost every incarnation of R^2 requires the variance of the
> prediction. This particular failure mode would result in a divide by
> zero.
>
> Try using you own summary function (see ?trainControl) and put a
> print(summary(data$pred)) in there to verify my claim.
>
> Max
>
> On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn <dominik at dbruhn.de> wrote:
>> Hy,
>> I got the following problem when trying to build a rpart model and using
>> everything but LOOCV. Originally, I wanted to used k-fold partitioning,
>> but every partitioning except LOOCV throws the following warning:
>>
>> ----
>> Warning message: In nominalTrainWorkflow(dat = trainData, info =
>> trainInfo, method = method, : There were missing values in resampled
>> performance measures.
>> -----
>>
>> Below are some simplified testcases which repoduce the warning on my
>> system.
>>
>> Question: What does this error mean? How can I avoid it?
>>
>> System-Information:
>> -----
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
>> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] rpart_3.1-52 caret_5.15-023 foreach_1.4.0 cluster_1.14.2
>> reshape_0.8.4
>> [6] plyr_1.7.1 lattice_0.20-6
>>
>> loaded via a namespace (and not attached):
>> [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6
>> [5] tools_2.15.0
>> -------
>>
>>
>> Simlified Testcase I: Throws warning
>> ---
>> library(caret)
>> data(trees)
>> formula=Volume~Girth+Height
>> train(formula, data=trees, method='rpart')
>> ---
>>
>> Simlified Testcase II: Every other CV-method also throws the warning,
>> for example using 'cv':
>> ---
>> library(caret)
>> data(trees)
>> formula=Volume~Girth+Height
>> tc=trainControl(method='cv')
>> train(formula, data=trees, method='rpart', trControl=tc)
>> ---
>>
>> Simlified Testcase III: The only CV-method which is working is 'LOOCV':
>> ---
>> library(caret)
>> data(trees)
>> formula=Volume~Girth+Height
>> tc=trainControl(method='LOOCV')
>> train(formula, data=trees, method='rpart', trControl=tc)
>> ---
>>
>>
>> Thanks!
>> --
>> Dominik Bruhn
>> mailto: dominik at dbruhn.de
>>
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
>
> Max
--
Max
More information about the R-help
mailing list