[R] Inconsistent results between caret+kernlab versions

Mon Nov 18 02:23:21 CET 2013

Hi Max,

Thanks very much for investigating and explaining that - your help and time is much appreciated. 

So as I understand it, using classProbs=F in trainControl() will give me the same accuracy results as before. However, I was relying on the class probabilities to return ROC/sensitivity/specificity, using a custom function similar to twoClassSummary(). 

What I still don't quite understand is which accuracy values from train() I should trust: those using classProbs=T or classProbs=F? I'm using train() to compare different classification methods using several stats (accuracy, AUROC etc), but this issue means that suddenly SVM has got much worse (based on accuracy)! I guess this means that I should roll back to the earlier versions of caret and kernlab (which is a pain because then train often crashes with 'memory map' errors!)?

Thanks,

Andrew

On 16/11/2013, at 09:59 , Max Kuhn <mxkuhn at gmail.com> wrote:

> Or not!
> 
> The issue with with kernlab.
> 
> Background: SVM models do not naturally produce class probabilities. A
> secondary model (via Platt) is fit to the raw model output and a
> logistic function is used to translate the raw SVM output to
> probability-like numbers (i.e. sum to zero, between 0 and 1). In
> ksvm(), you need to use the option prob.model = TRUE to get that
> second model.
> 
> I discovered some time ago that there can be a discrepancy in the
> predicted classes that naturally come from the SVM model and those
> derived by using the class associated with the largest class
> probability. This is most likely do to natural error in the secondary
> probability model and should not be unexpected.
> 
> That is the case for your data. In you use the same tuning parameters
> as those suggested by train() and go straight to ksvm():
> 
>> newSVM <- ksvm(x = as.matrix(df[,-1]),
> +                y = df[,1],
> +                kernel = rbfdot(sigma = svm.m1$bestTune$.sigma),
> +                C = svm.m1$bestTune$.C,
> +                prob.model = TRUE)
>> 
>> predict(newSVM, df[43,-1])
> [1] O32078
> 10 Levels: O27479 O31403 O32057 O32059 O32060 O32078 ... O32676
>> predict(newSVM, df[43,-1], type = "probabilities")
>         O27479     O31403    O32057    O32059     O32060    O32078
> [1,] 0.08791826 0.05911645 0.2424997 0.1036943 0.06968587 0.1648394
>         O32089     O32663     O32668     O32676
> [1,] 0.04890477 0.05210836 0.09838892 0.07284396
> 
> Note that, based on the probability model, the class with the largest
> probability is O32057 (p = 0.24) while the basic SVM model predicts
> O32078 (p = 0.16).
> 
> Somebody (maybe me) saw this discrepancy and that led to me to follow this rule:
> 
> if(prob.model = TRUE) use the class with the maximum probability
>   else use the class prediction from ksvm().
> 
> Therefore:
> 
>> predict(svm.m1, df[43,-1])
> [1] O32057
> 10 Levels: O27479 O31403 O32057 O32059 O32060 O32078 ... O32676
> 
> That change occurred between the two caret versions that you tested with.
> 
> (On a side note, can also occur with ksvm() and rpart() if
> cost-sensitive training is used because the class designation takes
> into account the costs but the class probability predictions do not. I
> alerted both package maintainers to the issue some time ago.)
> 
> HTH,
> 
> Max
> 
> On Fri, Nov 15, 2013 at 1:56 PM, Max Kuhn <mxkuhn at gmail.com> wrote:
>> I've looked into this a bit and the issue seems to be with caret. I've
>> been looking at the svn check-ins and nothing stands out to me as the
>> issue so far. The final models that are generated are the same and
>> I'll try to figure out the difference.
>> 
>> Two small notes:
>> 
>> 1) you should set the seed to ensure reproducibility.
>> 2) you really shouldn't use character stings with all numbers as
>> factor levels with caret when you want class probabilities. It should
>> give you a warning about this
>> 
>> Max
>> 
>> On Thu, Nov 14, 2013 at 7:31 PM, Andrew Digby <andrewdigby at mac.com> wrote:
>>> 
>>> I'm using caret to assess classifier performance (and it's great!). However, I've found that my results differ between R2.* and R3.* - reported accuracies are reduced dramatically. I suspect that a code change to kernlab ksvm may be responsible (see version 5.16-24 here: http://cran.r-project.org/web/packages/caret/news.html). I get very different results between caret_5.15-61 + kernlab_0.9-17 and caret_5.17-7 + kernlab_0.9-19 (see below).
>>> 
>>> Can anyone please shed any light on this?
>>> 
>>> Thanks very much!
>>> 
>>> 
>>> ### To replicate:
>>> 
>>> require(repmis)  # For downloading from https
>>> df <- source_data('https://dl.dropboxusercontent.com/u/47973221/data.csv', sep=',')
>>> require(caret)
>>> svm.m1 <- train(df[,-1],df[,1],method='svmRadial',metric='Kappa',tunelength=5,trControl=trainControl(method='repeatedcv', number=10, repeats=10, classProbs=TRUE))
>>> svm.m1
>>> sessionInfo()
>>> 
>>> ### Results - R2.15.2
>>> 
>>>> svm.m1
>>> 1241 samples
>>>   7 predictors
>>>  10 classes: ‘O27479’, ‘O31403’, ‘O32057’, ‘O32059’, ‘O32060’, ‘O32078’, ‘O32089’, ‘O32663’, ‘O32668’, ‘O32676’
>>> 
>>> No pre-processing
>>> Resampling: Cross-Validation (10 fold, repeated 10 times)
>>> 
>>> Summary of sample sizes: 1116, 1116, 1114, 1118, 1118, 1119, ...
>>> 
>>> Resampling results across tuning parameters:
>>> 
>>>  C     Accuracy  Kappa  Accuracy SD  Kappa SD
>>>  0.25  0.684     0.63   0.0353       0.0416
>>>  0.5   0.729     0.685  0.0379       0.0445
>>>  1     0.756     0.716  0.0357       0.0418
>>> 
>>> Tuning parameter ‘sigma’ was held constant at a value of 0.247
>>> Kappa was used to select the optimal model using  the largest value.
>>> The final values used for the model were C = 1 and sigma = 0.247.
>>>> sessionInfo()
>>> R version 2.15.2 (2012-10-26)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>> 
>>> locale:
>>> [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> 
>>> other attached packages:
>>> [1] e1071_1.6-1     class_7.3-5     kernlab_0.9-17  repmis_0.2.4    caret_5.15-61   reshape2_1.2.2  plyr_1.8        lattice_0.20-10 foreach_1.4.0   cluster_1.14.3
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] codetools_0.2-8 compiler_2.15.2 digest_0.6.0    evaluate_0.4.3  formatR_0.7     grid_2.15.2     httr_0.2        iterators_1.0.6 knitr_1.1       RCurl_1.95-4.1  stringr_0.6.2   tools_2.15.2
>>> 
>>> ### Results - R3.0.2
>>> 
>>>> require(caret)
>>>> svm.m1 <- train(df[,-1],df[,1],method=’svmRadial’,metric=’Kappa’,tunelength=5,trControl=trainControl(method=’repeatedcv’, number=10, repeats=10, classProbs=TRUE))
>>> Loading required package: class
>>> Warning messages:
>>> 1: closing unused connection 4 (https://dl.dropboxusercontent.com/u/47973221/df.Rdata)
>>> 2: executing %dopar% sequentially: no parallel backend registered
>>>> svm.m1
>>> 1241 samples
>>>   7 predictors
>>>  10 classes: ‘O27479’, ‘O31403’, ‘O32057’, ‘O32059’, ‘O32060’, ‘O32078’, ‘O32089’, ‘O32663’, ‘O32668’, ‘O32676’
>>> 
>>> No pre-processing
>>> Resampling: Cross-Validation (10 fold, repeated 10 times)
>>> 
>>> Summary of sample sizes: 1118, 1117, 1115, 1117, 1116, 1118, ...
>>> 
>>> Resampling results across tuning parameters:
>>> 
>>>  C     Accuracy  Kappa  Accuracy SD  Kappa SD
>>>  0.25  0.372     0.278  0.033        0.0371
>>>  0.5   0.39      0.297  0.0317       0.0358
>>>  1     0.399     0.307  0.0289       0.0323
>>> 
>>> Tuning parameter ‘sigma’ was held constant at a value of 0.2148907
>>> Kappa was used to select the optimal model using  the largest value.
>>> The final values used for the model were C = 1 and sigma = 0.215.
>>>> sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>> 
>>> locale:
>>> [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> 
>>> other attached packages:
>>> [1] e1071_1.6-1     class_7.3-9     kernlab_0.9-19  repmis_0.2.6.2  caret_5.17-7    reshape2_1.2.2  plyr_1.8        lattice_0.20-24 foreach_1.4.1   cluster_1.14.4
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] codetools_0.2-8 compiler_3.0.2  digest_0.6.3    grid_3.0.2      httr_0.2        iterators_1.0.6 RCurl_1.95-4.1  stringr_0.6.2   tools_3.0.2
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> 
>> --
>> 
>> Max
> 
> 
> 
> -- 
> 
> Max