[R] aucRoc in caret package [SEC=UNCLASSIFIED]

David Winsemius dwinsemius at comcast.net
Thu Jun 2 06:08:21 CEST 2011


On Jun 1, 2011, at 10:41 PM, Max Kuhn wrote:

> David,
>
> The ROC curve should really be computed with some sort of numeric data
> (as opposed to classes). It varies the cutoff to get a continuum of
> sensitivity and specificity values.  Using the classes as 1's and 2's
> implies that the second class is twice the value of the first, which
> doesn't really make sense.
>
> Try getting the class probabilities for predicted1 and predicted2 and
> use those instead.

Yes. You should be addressing this to Jin. I have been trying with  
little success to explain this.

-- 
David.
>
> Thanks,
>
> Max
>
>
> On Wed, Jun 1, 2011 at 9:24 PM, <Jin.Li at ga.gov.au> wrote:
>>
>> Please note that predicted1 and predicted2 are two sets of  
>> predictions instead of predictors. As you can see the predictions  
>> with only two levels, 1 is for hard and 2 for soft. I need to  
>> assess which one is more accurate. Hope this is clear now. Thanks.
>> Jin
>>
>> -----Original Message-----
>> From: David Winsemius [mailto:dwinsemius at comcast.net]
>> Sent: Thursday, 2 June 2011 10:55 AM
>> To: Li Jin
>> Cc: R-help at r-project.org
>> Subject: Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED]
>>
>> Using AUC for discrete predictor variables with inly two levels
>> doesn't seem very sensible. What are you planning to to with this
>> measure?
>>
>> --
>> David.
>>
>> On Jun 1, 2011, at 8:47 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au>  
>> wrote:
>>
>>> Hi all,
>>> I used the following code and data to get auc values for two sets of
>>> predictions:
>>>            library(caret)
>>>> table(predicted1, trainy)
>>>   trainy
>>>    hard soft
>>>  1   27    0
>>>  2   11   99
>>>> aucRoc(roc(predicted1, trainy))
>>> [1] 0.5
>>>
>>>
>>>> table(predicted2, trainy)
>>>   trainy
>>>    hard soft
>>>  1   27    2
>>>  2   11   97
>>>> aucRoc(roc(predicted2, trainy))
>>> [1] 0.8451621
>>>
>>> predicted1:
>>> 1 1 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2
>>> 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2
>>> 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2
>>> 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2  
>>> 2 2
>>>
>>> predicted2:
>>> 1 1 2 1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 2
>>> 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2
>>> 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2
>>> 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2  
>>> 2 2
>>>
>>> trainy:
>>> hard hard hard soft soft hard hard hard hard soft soft soft soft
>>> soft soft hard soft soft soft soft soft soft hard soft soft soft
>>> soft soft soft soft soft soft hard soft soft soft soft soft hard
>>> soft soft soft soft hard hard soft soft soft hard soft hard soft
>>> soft soft soft soft hard soft soft soft soft soft soft soft soft
>>> hard soft soft soft soft soft hard soft soft soft soft soft soft
>>> soft hard soft soft soft hard hard hard hard hard soft soft hard
>>> hard hard soft hard soft soft soft hard hard soft soft soft soft
>>> soft hard hard hard hard hard hard hard soft soft soft soft soft
>>> soft soft soft soft soft soft soft soft soft soft soft hard soft
>>> soft soft soft soft soft soft soft
>>> Levels: hard soft
>>>
>>>> Sys.info()
>>>                     sysname
>>> release                      version                     nodename
>>>                   "Windows"                      "XP"        "build
>>> 2600, Service Pack 3"        "PC-60772"
>>>                     machine
>>>                       "x86"
>>>
>>> I would expect predicted1 is more accurate that the predicted2. But
>>> the auc values show an opposite. I was wondering whether this is a
>>> bug or I have done something wrong.  Thanks for your help in  
>>> advance!
>>>
>>> Cheers,
>>>
>>> Jin
>>> ____________________________________
>>> Jin Li, PhD
>>> Spatial Modeller/Computational Statistician
>>> Marine & Coastal Environment
>>> Geoscience Australia
>>> GPO Box 378, Canberra, ACT 2601, Australia
>>>
>>> Ph: 61 (02) 6249 9899; email:
>>> jin.li at ga.gov.au<mailto:jin.li at ga.gov.au>
>>> _______________________________________
>>>
>>>
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Max

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list