[R] Best SVM Performance measure?

Tue Oct 20 08:00:40 CEST 2009

Hi,

This is probably going to be one of those, "It depends what you want" 
kind of answers, but I'm very curious to see if the group has an opinion 
or some general suggestions.

The actual experiment is too complicated for a quick e-mail, but I'll 
summarize well enough(hopefully) to get the concepts across.

Binary classification problem
Using and SVM (e1071) to train a model
Experimenting with different features, costs, etc.)

Training data and test data are complete separate data sets drawn from 
the same population.  The general concept was to train on a large set of 
data and then test of a medium sized set of unseen data.

We're looking for the best classification performance for future 
unlabeled data.

Here is the puzzle:

Comparing two versions of the model.
     A - Lower R2 (r squared) score but higher percentage labeled 
correct on test data
     B - Higher R2 score but lower percentage labeled correct on test data

We're using the val.prob function from the Design library to evaluate 
our model.

Additionally graphs from val.prob are interesting:
     A - Our "non-parametric" line mostly parallels the ideal line but 
is just a bit above.
     B - Our "non-parametric" line mostly parallels the ideal line but 
is just a bit below.

If I understand things correctly, with model A, the actual probability 
is slightly higher than our predicted probability (not a bad thing for 
our application - better to under-predict than over predict.)

One thought was that the R2 measures the distance from the "ideal line". 
  With model A, we are a touch further from the ideal line, but in a 
better position than model B.

Does anybody have any insight?

Thanks,

-N