[R] aucRoc in caret package

David Winsemius dwinsemius at comcast.net
Thu Jun 2 04:24:33 CEST 2011


On Jun 1, 2011, at 9:24 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au> wrote:

> Please note that predicted1 and predicted2 are two sets of  
> predictions instead of predictors. As you can see the predictions  
> with only two levels, 1 is for hard and 2 for soft.

Yes, I (very clearly I think) saw that.

> I need to assess which one is more accurate. Hope this is clear now.  
> Thanks.
> Jin

So how big do you want to dig your hole? AUC is not designed to be a  
score for categorical variables. It's designed for a continuous  
predictor. The only information in your two-way classification of  
dichotomous states is in the off-axis values.... 11 to naught versus  
11 to 2. Other than that you have total agreement.  Not much to work on.

-- 
david.

>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Thursday, 2 June 2011 10:55 AM
> To: Li Jin
> Cc: R-help at r-project.org
> Subject: Re: [R] aucRoc in caret package [SEC=UNCLASSIFIED]
>
> Using AUC for discrete predictor variables with inly two levels
> doesn't seem very sensible. What are you planning to to with this
> measure?
>
> -- 
> David.
>
> On Jun 1, 2011, at 8:47 PM, <Jin.Li at ga.gov.au> <Jin.Li at ga.gov.au>  
> wrote:
>
>> Hi all,
>> I used the following code and data to get auc values for two sets of
>> predictions:
>>           library(caret)
>>> table(predicted1, trainy)
>>  trainy
>>   hard soft
>> 1   27    0
>> 2   11   99
>>> aucRoc(roc(predicted1, trainy))
>> [1] 0.5
>>
>>
>>> table(predicted2, trainy)
>>  trainy
>>   hard soft
>> 1   27    2
>> 2   11   97
>>> aucRoc(roc(predicted2, trainy))
>> [1] 0.8451621
>>
>> predicted1:
>> 1 1 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2
>> 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2
>> 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2
>> 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>
>> predicted2:
>> 1 1 2 1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 2
>> 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2
>> 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 2
>> 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>
>> trainy:
>> hard hard hard soft soft hard hard hard hard soft soft soft soft
>> soft soft hard soft soft soft soft soft soft hard soft soft soft
>> soft soft soft soft soft soft hard soft soft soft soft soft hard
>> soft soft soft soft hard hard soft soft soft hard soft hard soft
>> soft soft soft soft hard soft soft soft soft soft soft soft soft
>> hard soft soft soft soft soft hard soft soft soft soft soft soft
>> soft hard soft soft soft hard hard hard hard hard soft soft hard
>> hard hard soft hard soft soft soft hard hard soft soft soft soft
>> soft hard hard hard hard hard hard hard soft soft soft soft soft
>> soft soft soft soft soft soft soft soft soft soft soft hard soft
>> soft soft soft soft soft soft soft
>> Levels: hard soft
>>
>>> Sys.info()
>>                    sysname
>> release                      version                     nodename
>>                  "Windows"                      "XP"        "build
>> 2600, Service Pack 3"        "PC-60772"
>>                    machine
>>                      "x86"
>>
>> I would expect predicted1 is more accurate that the predicted2. But
>> the auc values show an opposite. I was wondering whether this is a
>> bug or I have done something wrong.  Thanks for your help in advance!
>>
>> Cheers,
>>
>> Jin
>> ____________________________________
>> Jin Li, PhD
>> Spatial Modeller/Computational Statistician

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list