[R] SVM question

Murad Nayal mn216 at columbia.edu
Tue Nov 18 18:04:09 CET 2003



Hello all,

I am trying to use svm (from the e1071 package) to solve a binary
classification problem. The two classes in my particular data set are
unequally populated. class 'I' (for important) has about 3000 instances
while class "B" (for background) has about 20,000. experimenting with
different classifiers I realized that in cases where such an asymmetry
exists there is a danger in trivially inflating accuracy levels by
biasing the classifier towards the more prevalent class. for example,
using the numbers cited above, if the testing set maintains the same
distribution of classes as the original data set then you can get an
accuracy of about 85% by simply classifying everything as a "B". an
unsatisfactory classifier given the 'importance' of detecting the "I"
class.

which brings me to my question: I am trying to adjust for these issues
by 

- using the class.weights parameter of svm: I couldn't quite get a sense
of how to use this parameter from the svm help page (or the introductory
papers on the libsvm web site). Is this supposed to be a vector of the
priors for the two classes i.e. c(I=.15,B=.85) (which gave me horrible
coverage of the 'I' class). is there any 'correct' or conventional
values to use for this parameter in cases of unequal sample sizes (for
example, the 'complement' of the priors: c(I=0.85,B=0.15) on the grounds
that these values will give the two classes in the dataset equal
weights. or is it simply another tunable parameter. 

- choosing training sets that contain randomly selected but equal
numbers of cases of each class (and testing on the remaining cases. this
is repeated to assess stability of the accuracy and coverage values).
here i get mediocre accuracy but respectable coverage of "I". This is
not strictly an R question, but I thought someone on the list might have
had recent experience with these types of problems and can offer some
comments about such an approach.

many thanks

 
-- 
Murad Nayal M.D. Ph.D.
Department of Biochemistry and Molecular Biophysics
College of Physicians and Surgeons of Columbia University
630 West 168th Street. New York, NY 10032
Tel: 212-305-6884	Fax: 212-305-6926




More information about the R-help mailing list