[R] Statistical significance of a classifier
Liaw, Andy
andy_liaw at merck.com
Fri Aug 5 22:06:05 CEST 2005
> From: Martin C. Martin
>
> Hi,
>
> I have a bunch of data points x from two classes A & B, and
> I'm creating
> a classifier. So I have a function f(x) which estimates the
> probability
> that x is in class A. (I have an equal number of examples of
> each, so
> p(class) = 0.5.)
>
> One way of seeing how well this does is to compute the error
> rate on the
> test set, i.e. if f(x)>0.5 call it A, and see how many times I
> misclassify an item. That's what MASS does. But we should
Surely you mean `99% of dataminers/machine learners' rather than `MASS'?
> be able to
> do better: misclassifying should be more of a problem if the
> regression
> is confident then if it isn't.
>
> How can I show that my f(x) = P(x is in class A) does better
> than chance?
It depends on what you mean by `better'. For some problem, people are
perfectly happy with misclassifcation rate. For others, the estimated
probabilities count a lot more. One possibility is to look at the ROC
curve. Another possibility is to look at the calibration curve (see MASS
the book).
Andy
> Thanks,
> Martin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>
More information about the R-help
mailing list