[R] Statistical significance of a classifier
Martin C. Martin
martin at metahuman.org
Fri Aug 5 21:58:39 CEST 2005
Hi,
I have a bunch of data points x from two classes A & B, and I'm creating
a classifier. So I have a function f(x) which estimates the probability
that x is in class A. (I have an equal number of examples of each, so
p(class) = 0.5.)
One way of seeing how well this does is to compute the error rate on the
test set, i.e. if f(x)>0.5 call it A, and see how many times I
misclassify an item. That's what MASS does. But we should be able to
do better: misclassifying should be more of a problem if the regression
is confident then if it isn't.
How can I show that my f(x) = P(x is in class A) does better than chance?
Thanks,
Martin
More information about the R-help
mailing list