[R] Nominal variables in SVM?
Noah Silverman
noah at smartmediacorp.com
Wed Aug 12 20:53:39 CEST 2009
Hi,
The answers to my previous question about nominal variables has lead me
to a more important question.
What is the "best practice" way to feed nominal variable to an SVM.
For example:
color = ("red, "blue", "green")
I could translate that into an index so I wind up with
color= (1,2,3)
But my concern is that the SVM will now think that the values are
numeric in "range" and not discrete conditions.
Another thought would be to create 3 binary variables from the single
color variable, so I have:
red = (0,1)
blue = (0,1)
green = (0,1)
A example fed to the SVM would have one positive and two negative values
to indicate the color value:
i.e. for a blue example:
red = 0, blue =1 , green = 0
Or, do any of the SVM packages intelligently handle this internally so
that I don't have to mess with it. If so, do I need to be concerned
about different "translation" of the data if the test data set isn't
exactly the same as the training set.
For example:
training data = color ("red, "blue", "green")
test data = color ("red, "green")
How would I be sure that the "red" and "green" examples get encoded the
same so that the SVM is accurate?
Thanks in advance!!
-N
More information about the R-help
mailing list