[R] e1071 question: what's the definition of performance in t une.* functions?

Tue Jul 13 03:40:26 CEST 2004

Looking at the body of tune(), it has:

...
                repeat.errors[reps] <- if (is.factor(true.y)) 
                  1 - classAgreement(table(pred, true.y))
                else crossprod(pred - true.y)/length(pred)
...

where classAgreement() is a function defined inside tune() that computes the
fraction of correctly predicted cases.  So it looks like tune() and friends
are returning error rates as fractions, not percentages.

You're right that the fraction shouldn't be larger than 1.  Did you make
sure that tune() sees the data as classification, not regression (i.e., did
you make sure that the class labels given to tune.*() are factor)?

HTH,
Andy

> From: Tae-Hoon Chung [mailto:thchung at tgen.org] 
> 
> Thanks Andy, however, let me make it more clear.
> 
> When you run tune.*, you will get performance value like 0.7...
> If this value is percent, we get error rate of 0.7% which is excellent
> (of course, we should be sure whether this is really a case of  
> over-fitting ...
> but anyway nominally this error rate is great).
> However, if this error rate is ratio, than 0.7 is poor because  
> basically we have 70% error rate.
> So my question is whether the error rate is presented in 
> percent or is  
> just the error rate.
> One puzzling thing is that when you run tune.*, you will also get  
> values like
> 1.2* which makes it absurd to regard it as ratio because 
> ratio larger  
> than
> 1 is really absurd, right?
> However, since the definition is not explicitly given 
> anywhere, it is  
> hard to interpret the result properly.
> 
> Thanks in advance;
> TH
> 
> On Jul 12, 2004, at 5:55 PM, Liaw, Andy wrote:
> 
> > Basically, the `Detail' section of ?tune says it all:
> >
> > Details:
> >
> >      As performance measure, the classification error is used for
> >      classification, and the mean squared error for regression. ...
> >
> >
> > Andy
> >
> >> From: Tae-Hoon Chung
> >>
> >> Hi, all;
> >>
> >> Basically, the subject contains the all information I need to know.
> >> In e1071 library, there are functions to tune parameters.
> >> They provide several values one of which is the performance.
> >> Does any body know the "definition" of performance here?
> >> Is it percentage of error or just the error rate or anything else?
> >>
> >> Thanks in advance!
> >>
> >> Tae-Hoon Chung, Ph.D
> >>
> >> Post-doctoral Research Fellow
> >> Molecular Diagnostics and Target Validation Division
> >> Translational Genomics Research Institute
> >> 1275 W Washington St, Tempe AZ 85281 USA
> >> Phone: 602-343-8724
> >>
> >
> >
> > 
> --------------------------------------------------------------
> --------- 
> > -------
> > Notice:  This e-mail message, together with any 
> attachments, contains  
> > information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station,  
> > New Jersey, USA 08889), and/or its affiliates (which may be known  
> > outside the United States as Merck Frosst, Merck Sharp & 
> Dohme or MSD  
> > and in Japan, as Banyu) that may be confidential, proprietary  
> > copyrighted and/or legally privileged. It is intended 
> solely for the  
> > use of the individual or entity named on this message.  If 
> you are not  
> > the intended recipient, and have received this message in error,  
> > please notify us immediately by reply e-mail and then 
> delete it from  
> > your system.
> > 
> --------------------------------------------------------------
> --------- 
> > -------
> >
> >
> Tae-Hoon Chung, Ph.D
> 
> Post-doctoral Research Fellow
> Molecular Diagnostics and Target Validation Division
> Translational Genomics Research Institute
> 1275 W Washington St, Tempe AZ 85281 USA
> Phone: 602-343-8724
> 
> 
>