[R] Statistical significance of a classifier

Fri Aug 5 21:58:39 CEST 2005

Hi,

I have a bunch of data points x from two classes A & B, and I'm creating 
a classifier.  So I have a function f(x) which estimates the probability 
that x is in class A.  (I have an equal number of examples of each, so 
p(class) = 0.5.)

One way of seeing how well this does is to compute the error rate on the 
test set, i.e. if f(x)>0.5 call it A, and see how many times I 
misclassify an item.  That's what MASS does.  But we should be able to 
do better: misclassifying should be more of a problem if the regression 
is confident then if it isn't.

How can I show that my f(x) = P(x is in class A) does better than chance?

Thanks,
Martin