[R] Inconsistent results between caret+kernlab versions
Andrew Digby
andrewdigby at mac.com
Fri Nov 15 01:31:40 CET 2013
I'm using caret to assess classifier performance (and it's great!). However, I've found that my results differ between R2.* and R3.* - reported accuracies are reduced dramatically. I suspect that a code change to kernlab ksvm may be responsible (see version 5.16-24 here: http://cran.r-project.org/web/packages/caret/news.html). I get very different results between caret_5.15-61 + kernlab_0.9-17 and caret_5.17-7 + kernlab_0.9-19 (see below).
Can anyone please shed any light on this?
Thanks very much!
### To replicate:
require(repmis) # For downloading from https
df <- source_data('https://dl.dropboxusercontent.com/u/47973221/data.csv', sep=',')
require(caret)
svm.m1 <- train(df[,-1],df[,1],method='svmRadial',metric='Kappa',tunelength=5,trControl=trainControl(method='repeatedcv', number=10, repeats=10, classProbs=TRUE))
svm.m1
sessionInfo()
### Results - R2.15.2
> svm.m1
1241 samples
7 predictors
10 classes: ‘O27479’, ‘O31403’, ‘O32057’, ‘O32059’, ‘O32060’, ‘O32078’, ‘O32089’, ‘O32663’, ‘O32668’, ‘O32676’
No pre-processing
Resampling: Cross-Validation (10 fold, repeated 10 times)
Summary of sample sizes: 1116, 1116, 1114, 1118, 1118, 1119, ...
Resampling results across tuning parameters:
C Accuracy Kappa Accuracy SD Kappa SD
0.25 0.684 0.63 0.0353 0.0416
0.5 0.729 0.685 0.0379 0.0445
1 0.756 0.716 0.0357 0.0418
Tuning parameter ‘sigma’ was held constant at a value of 0.247
Kappa was used to select the optimal model using the largest value.
The final values used for the model were C = 1 and sigma = 0.247.
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] e1071_1.6-1 class_7.3-5 kernlab_0.9-17 repmis_0.2.4 caret_5.15-61 reshape2_1.2.2 plyr_1.8 lattice_0.20-10 foreach_1.4.0 cluster_1.14.3
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.2 digest_0.6.0 evaluate_0.4.3 formatR_0.7 grid_2.15.2 httr_0.2 iterators_1.0.6 knitr_1.1 RCurl_1.95-4.1 stringr_0.6.2 tools_2.15.2
### Results - R3.0.2
> require(caret)
> svm.m1 <- train(df[,-1],df[,1],method=’svmRadial’,metric=’Kappa’,tunelength=5,trControl=trainControl(method=’repeatedcv’, number=10, repeats=10, classProbs=TRUE))
Loading required package: class
Warning messages:
1: closing unused connection 4 (https://dl.dropboxusercontent.com/u/47973221/df.Rdata)
2: executing %dopar% sequentially: no parallel backend registered
> svm.m1
1241 samples
7 predictors
10 classes: ‘O27479’, ‘O31403’, ‘O32057’, ‘O32059’, ‘O32060’, ‘O32078’, ‘O32089’, ‘O32663’, ‘O32668’, ‘O32676’
No pre-processing
Resampling: Cross-Validation (10 fold, repeated 10 times)
Summary of sample sizes: 1118, 1117, 1115, 1117, 1116, 1118, ...
Resampling results across tuning parameters:
C Accuracy Kappa Accuracy SD Kappa SD
0.25 0.372 0.278 0.033 0.0371
0.5 0.39 0.297 0.0317 0.0358
1 0.399 0.307 0.0289 0.0323
Tuning parameter ‘sigma’ was held constant at a value of 0.2148907
Kappa was used to select the optimal model using the largest value.
The final values used for the model were C = 1 and sigma = 0.215.
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] e1071_1.6-1 class_7.3-9 kernlab_0.9-19 repmis_0.2.6.2 caret_5.17-7 reshape2_1.2.2 plyr_1.8 lattice_0.20-24 foreach_1.4.1 cluster_1.14.4
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.0.2 digest_0.6.3 grid_3.0.2 httr_0.2 iterators_1.0.6 RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.2
More information about the R-help
mailing list