[R] Categorical Variables and Machine Learning
Lorenzo Isella
lorenzo.isella at gmail.com
Thu Feb 17 15:13:47 CET 2011
Dear All,
Please consider a dataframe like the one below (I am showing only a few
rows).
> role degree strength weight count disparity intermittency
> P 10 82 18017 2 2.317073 5.550314e-05
> P 7 529 4345 60 5.178466 6.904488e-03
> P 8 609 4382 10 6.204535 1.141031e-03
> D 42 230 6910 88 1.791153 6.367583e-03
You have a categorical variable (the role variable) which can assume
only a few values ("P","D","C","N","A") referring to different
individuals for whom you collect some extra properties (namely, degree,
strength, weight, disparity and intermittency, like in the table above).
My goal is to find the most suitable property (or combination of
properties) to guess the role of an individual. It looks like a typical
machine learning problem, but I have categorical variables to predict.
I am drowning in the wealth of R packages for machine learning, but I
really would like something simple and easy to use (consider that the
dataset covers only 120 individuals, so performance is not a problem).
Any suggestion is appreciated.
Cheers
Lorenzo
More information about the R-help
mailing list