[R] Best SVM Performance measure?
Noah Silverman
noah at smartmediacorp.com
Mon Oct 19 23:20:04 CEST 2009
Hi,
This is probably going to be one of those, "It depends what you want"
kind of answers, but I'm very curious to see if the group has an opinion
or some general suggestions.
The actual experiment is too complicated for a quick e-mail, but I'll
summarize well enough(hopefully) to get the concepts across.
Binary classification problem
Using and SVM (e1071) to train a model
Experimenting with different features, costs, etc.)
Training data and test data are complete separate data sets drawn from
the same population. The general concept was to train on a large set of
data and then test of a medium sized set of unseen data.
We're looking for the best classification performance for future
unlabeled data.
Here is the puzzle:
Comparing two versions of the model.
A - Lower R2 (r squared) score but higher percentage labeled
correct on test data
B - Higher R2 score but lower percentage labeled correct on test data
We're using the val.prob function from the Design library to evaluate
our model.
Additionally graphs from val.prob are interesting:
A - Our "non-parametric" line mostly parallels the ideal line but
is just a bit above.
B - Our "non-parametric" line mostly parallels the ideal line but
is just a bit below.
If I understand things correctly, with model A, the actual probability
is slightly higher than our predicted probability (not a bad thing for
our application - better to under-predict than over predict.)
One thought was that the R2 measures the distance from the "ideal
line". With model A, we are a touch further from the ideal line, but in
a better position than model B.
Does anybody have any insight?
Thanks,
-N
More information about the R-help
mailing list