[R] Stepwise SVM Variable selection
Noah Silverman
noah at smartmediacorp.com
Fri Jan 7 08:10:59 CET 2011
I have a data set with about 30,000 training cases and 103 variable.
I've trained an SVM (using the e1071 package) for a binary classifier
{0,1}. The accuracy isn't great.
I used a grid search over the C and G parameters with an RBF kernel to
find the best settings.
I remember that for least squares, R has a nice stepwise function that
will try combining subsets of variables to find the optimal result.
Clearly, this doesn't exist for SVMs as a built in function.
As an experiment, I simply grabbed the first 50 variables and repeated
the training/grid search procedure. The results were significantly
better. Since the date is VERY noisy, my guess is that eliminating some
of the variables eliminated some noise that resulted in better results.
With a grid of 100 parameter settings (10 for C, 10 for G) and 106
variables, trying every combination would be prohibitively time consuming.
Can anyone suggest an approach to seek the ideal subset of variables for
my SVM classifier?
Thanks!
More information about the R-help
mailing list